Thursday, July 29, 2010

Researchers of tomorrow

Posting to different lists

All,

I haven't had time to look at this yet, but talking about how people search, etc., there is a study that came out by the British JISC called "Researchers of Tomorrow: A three year (BL/JISC) study tracking the research behaviour of 'Generation Y' doctoral students" available at http://www.jisc.ac.uk/news/stories/2010/07/generationY.aspx.

A chart on p. 32 is not very encouraging. They discovered that those who cited the "main source" for their research as the "internal library catalog" was at a high of what looks like 12%, and those are the Arts and Humanties. It looks as if it goes down to 1% or 2% for another group that I cannot make out. Anyway, Google and Google Scholar are much higher.

There are several sections that seem interesting. I must point out that their report's conclusions seem a bit unsatisfying for me (at first glance) and I think some other conclusions could be drawn.

Anyway, it seems to be an important publication.

Wednesday, July 28, 2010

RE: Trust Online: Young Adults' Evaluation of Web Content

Posting to NGC4LIB

John,

What a thoughtful post. Thank you very much for sharing it. You emphasize some very interesting points and lay them out clearly.

To return to an earlier question I posed in another thread: what is (will be) the task of selection in this new environment? Every user with whom I have discussed these issues, from young person to emeritus scholar and researcher, wants information that is "reliable". That is not surprising but of course, it raises another question: what does "reliable" mean? One professor I spoke with last night told me that he *still* tells his students never to use Wikipedia. That is simply unfair to the students (who will use it anyway of course--or at least I hope they will!) and ignores reality. My own information literacy courses take another tact: I tend to say that all information resources are OK, but you must use each one wisely. One may be a great example of a primary source, another: of biased information, another: of obsolete, scholarly information, another: of time-sensitive newspaper information, and so on. I don't have time to do much, but at least I try to treat students as adults and I don't tell them they should bow to the local censor.

But it's also clear that "easy to access" must be considered in this because of the results of the Northwestern study and the quote from Marcia Bates. One additional complexity that I have met with when I discuss these matters with former students who have finished school: we should also prepare our users to be able to handle information when they are out of school and no longer have access to our fabulously expensive databases. We should prepare them not for some ideal world, but for the real world they will encounter when they leave our institutions because otherwise, we must confess that we leave them almost helpless and completely reliant on things such as Google, when in reality, they will still have many options.

When you mention screening out the "fraudulent materials", this is what I had in mind. I think this is what people want, but "fraudulent" means different things to different people. And when you write:

<snip>
... how politically aware most undergraduate students in the U.S. are, and whether they have enough philosophical or political insight to adequately detect (increasing?) themes of political bias on the web.
</snip>

I would only add: it is just as important for them to see political bias in printed items and this unfortunate situation is not limited only to U.S. undergraduates. I have found graduates and full professors from around the world fall victim to this political bias. It seems that detecting different types of bias is becoming more important by the day (I think), but in my experience, they are not taught it. The idea of labelling sites as: conservative, liberal, etc. repels me as a professional, and although librarians may be pushed in that direction, I would try to leave that to crowdsourcing as much as possible. (I know that's a problem, too)

I still believe that Ross Atkinson's idea of the Control Zone could somehow be the foundation: http://tinyurl.com/33k2qtv. This paper is rather complicated and I think the basic idea can be simplified: it would mean creating a "library space" where most of the traditional library functions could be found. With today's technology, something like this could be achieved technically in a bunch of different ways, but the most difficult part is getting agreement by all the libraries involved, to coordinate selection, metadata creation and organization, etc.

This means change, and I don't know if libraries can change that much.

Monday, July 26, 2010

Trust Online: Young Adults' Evaluation of Web Content

Posting to NGC4LIB

I would suggest members of this list read a very interesting article: Trust Online: Young Adults' Evaluation of Web Content (International Journal of Communication 4, 2010) by researchers from Northwestern. From the abstract:

"We find that the process by which users arrive at a site is an important component of how they judge the final destination. In particular, search context, branding and routines, and a reliance on those in one's networks play important roles in online information-seeking and evaluation. We also discuss that users differ considerably in their skills when it comes to judging online content credibility".

I think this report fills a hole in the literature (at least so much as I have found) that has people evaluate "the search result" instead of individual websites. Readers of this list probably understand that the results you are presented, i.e. the quest to become Google result #1, is extemely complex in the process of finding decent information. The report's results are not surprising: people do not understand the concept of "relevance" or how search engines work. While there is some awareness (apparently) that they should go to the "About Us" page, and similar parts of a site, it seems that people rarely do so. They discuss how people tend to go to the same sites over and over again.

But what interested me most was how people evaluated the search results, and what they noted:

"In some cases, the respondent regarded the search engine as the relevant entity for which to evaluate trustworthiness, rather than the Web site that contained the information. The following exchange between the researcher and a female social science major illustrates this point well:

Researcher: What is this Web site?
Respondent: Oh, I don't know. The first thing that came up"

I have seen this a lot myself, but I don't know if this indicates a level of "trust" or rather just the normal human failing of laziness. I have several times cited an excellent report by Marcia Bates, and I will again: "Improving User Access to Library Catalog and Portal Information" http://www.loc.gov/catdir/bibcontrol/2.3BatesReport6-03.doc.pdf where she writes:

"Principle of least effort.

Probably the single most frequently discovered finding on information seeking behavior is that people use the principle of least effort in their information seeking. This may seem reasonable and obvious, but the full significance of this finding must be understood. People do not just use information that is easy to find; they even use information they know to be of poor quality and less reliable--so long as it requires little effort to find--rather than using information they know to be of high quality and reliable, though harder to find. Research on this behavior dates at least as far back as the 1960s, when a major study demonstrated that physicians tended to rely on drug company salesmen for drug information, rather than consulting the research literature. (Coleman, Katz, & Menzel, 1967). Poole reviewed dozens of these studies in 1985 (Poole, 1985); Mann has a more recent review (Mann, 1992)."

Of course, I am guilty of this behavior myself. It seems we must accept that people will "exert the least effort" to get information, because what they get ready to hand will be "good enough". Back when people had no choice except to use well-selected libraries, using the card catalog to find peer-reviewed books was the "easiest" thing to do back then. People felt they could more or less "trust" what they read, but eliminating all of this selection and controls for materials on the web doesn't mean that the work shouldn't still be done--it just offsets it onto the shoulders of the users. So, in the Information Literacy classes, we exhort our patrons to read the "About Us" pages, search out information about the author(s), check their credentials and so on, but to believe people will do this is extremely naive. I figure the only real result of librarians telling people to do all this work that we *know* they won't do, is to just lay (yet another) guilt trip on them. After all, we didn't expect them to do this kind of work with printed books and magazines--why should we believe they will do anything else today?

But to be fair to the users, Google does not allow for much filtering for these purposes, although recently they have allowed for different sorts of the records, e.g. based on time, "Related searches" and that inscrutable "Wonder Wheel" which does something I don't understand at all! Still, if there were a filter for something like "reliable information," I am sure lots of people would click it. According to the Northwestern report, it seems that many users believe that is what they are getting when they click on result #1 in Google or Yahoo: the most "reliable". Utlimately, I think this forms part of the popularity of some of these Web2.0 tools, people get recommendations from others they feel they can "trust".

Previously, people *believed* they were getting reliable information when they pulled a book off a shelf in a library, but in reality, that was not guaranteed at all. Something on the shelf might be reviewed, or peer-reviewed, or edited, or not, and the amount of reviewing and/or editing may be better or worse. Also, the information could be completely obsolete.

I still believe however, that libraries can provide something unique that no other entity can today (although somebody probably will do this eventually and may even make tons of money off of it), and that is to provide some level of selection based on traditional librarian values. The article from Northwestern, I think, supports such a view because it shows that people are concerned about "quality content", it's just too difficult and complicated to expect each person to torment themselves by going through the process of quality control over and over and over again.

What that system of "reliability" would be that we could supply, I do not know.

RE: RDA, translations & translators. WAS: Further RDA record examples

Posting to Autocat

On Sun, 25 Jul 2010 17:07:20 +0000, Riley, Jenn wrote:

>I definitely agree with John Attig in his analysis of what's necessary to distinguish one Expression from another, and that the Expression entity is a complex one. To me, this suggests a different approach, though. No matter what we do in terms of rules and implementation, we're never going to pack enough information into an Expression title to fulfill this distinguishing function. Nor should we. In some context or another, whatever features of the full Expression record we choose to smush into a title, it won't be enough and we'll need more information. Therefore I don't believe it's a good use of cataloger time to be formulating complex Expression titles according to even more complex rules.

An absolutely correct analysis. I think it is a mistake to become increasingly theoretical at this point in time when practical issues are coming more and more to the fore. The focus should be on using the time of the cataloger in the best ways possible. So for me, it all comes back to user needs: if it can be demonstrated that users need this additional access badly enough, a case could then be made that perhaps we should devote the additional cataloger resources to it, which would necessarily come at the expense of other things that users want, e.g. cataloging new items.

I was at a conference a few years ago and listened to an excellent presentation by a "strategic futurist" Wayne Hodgins, entitled "Perfecting the Irrelevant", and part of it is at: waynehodgins.typepad.com/ontarget/files/perfecting_the_irrelevant.pdf. He talks about how in 1997, Smith Corona made the best typewriter ever made, won tons of awards for it, and.... immediately shut down production. It turned out that people didn't want typewriters any longer.

Please understand: I am *not* declaring that providing additional information at this level of the "expression" is irrelevant to the task at hand; I am saying that we must simply find out whether it is or is not irrelevant to our users. At this juncture we must conclude that we do not know.

We also need to keep in mind that with today's tools, there are options we have never had before, such as collaborating with other agencies who may have different, or supplementary information, so as to display mash-up records for users, or there are even possibilities for all kinds of crowd-sourcing.

Much of this could be enacted rather quickly, but it involves almost a sea change of attitude among catalogers (including myself) concerning who should have "control" of the records, what constitutes high-standards, and so on.

Thursday, July 22, 2010

RE: RDA and the Library Discovery Experience (Was: What to stop doing)

Posting to Autocat

Shawne D. Miksa wrote:

<snip>
Jim wrote:
>No one has shown how RDA will help us produce quality records more quickly or more easily; no one has shown why other metadata communities will want to create RDA-enabled metadata although they don't want to provide us with an AACR2 equivalent now; no one has shown why our users will suddenly find what they want in library catalogs and come rushing back to us. We are just told, and we are supposed to just believe, that FRBR and RDA are the wave of the future.>

Jim--I just have to accept that I don't understand your arguments. I see contradictions. You want proof that RDA works, but you condemn it without proof either way. The argument would be more convincing if you were basing it on a comprehensive study. If you haven't made a comprehensive study then I look forward to when you do and when it is published so that we can all make an informed decision.
</snip>

Shawne,

Thank you so much for answering. If everything were equal and this were merely an intellectual exercise, I might agree with you, but it isn't. Switching to RDA will be a huge task, with tremendous human consequences and consequences for libraries, involving major disruptions of work, large outlays for staff retraining, equipment retooling, buying new documentation, and so on and so on, and we shouldn't forget that this comes during the most difficult economic moment in (at least most of) our professional careers, with libraries cutting hours, laying off staff, shutting branches and well, I don't want to think about that. In any case, I am sure that the costs of enacting RDA will force at least some libraries to cut staff in some institutions, and will eat seriously into the acquisitions budget of many others, with the result that our collections will be less useful to the patrons.

So the consequences of errors are very serious indeed, but we can say that we're the generals, and if you are going to win a battle or a war, casualties are inevitable among the troops. But I think it's important that we should make sure that the gains are worth the price of the casualties, otherwise the troops turn into cannon fodder.

There are other very serious challenges facing libraries and their catalogs. One challenge we never really had to consider in the past was that of competition for users. In the past if somebody wanted information, they could try to look it up in somebody's Funk & Wagnall's, see if it was in an almanac, or lacking those, they had no choice except to come to the library, where they were stuck. Now, there are genuine alternatives that people like and prefer to the things we make. So, in many ways, we are facing the proverbial "Perfect Storm" of disaster financing, plus serious competition from some of the most successful, dynamic, and profitable organizations in the world.

This only scratches the surface of the challenges libraries face. Because of all this, libraries and cataloging in particular simply must change or be swept away, of this I have no doubt. I believe that being swept away is a real possibility today for many, so this is not just an intellectual exercise. The decision to implement RDA will affect people's lives. I'm sure many out there have read what happened in New Jersey with the libraries, where they felt lucky to lose *only 43%* of their funding. http://www.libraryjournal.com/lj/home/885795-264/new_jersey_library_funding_.html.csp Would it be wise for them to institute RDA now? I'm sure there are lots of similar stories that readers of this list could share.

Therefore, it isn't a matter of whether we should change: the question is the direction. When it comes to catalog records, I think libraries have one thing and one thing only that we can possibly give: quality. Lousy records can be made automatically where one computer in a few hours can probably put out more records than an entire department in five years or so. But quality requires something: time. And when you have outrageous numbers of materials that need to be cataloged, e.g. websites that people want *now, today, this minute* something is needed. We need a redefinition of "quality" for one thing, but even more important, I believe that above all what we need for cataloging to weather this storm successfully is: help. And a lot of it. From what I have heard and experienced, cataloging departments of all sorts are being raided of staff and budgets, much as the Forum here in Rome was robbed of its marble and statues by the nobility, who could "put it to better use" in their private villas. Catalogers need help.

The direction of change offered by RDA and FRBR do not address any of the serious issues facing us today. For example, if somehow, RDA offered possibilities for significantly greater access than ever before, that might make people like our product more and I might be more amenable--but it doesn't. If RDA offered possibilities for much greater cooperation among other metadata communities we could up our production substantially, so I might go along, but I can't imagine any publisher going along with RDA. If RDA offered a more useful final product to our patrons, that may be all right, but the final product of RDA with the FRBR/1840s-Panizzi-CatalogueOfTheBritishMuseum displays will not serve the needs of my users, and while you can probably find somebody out there who likes them, as a professional, I find them terrifying.

If RDA led to greater productivity because records were easier to make, that might be possible, or (as I mentioned) if training could be made easier, we may be able to get more help, but none of that is going to happen, and nobody has ever suggested it, as you point out. Therefore, it seems as if catalogers won't get any help very soon. Sad.

For all of these reasons, it seems to me that adopting RDA would be the same as making a "Leap of Faith", much like Augustine's saying "Faith is not belief without proof, but trust without reservation." I must confess that I have lost the religion and have no faith or trust. RDA is not the wave of the future. No way. It's a stone axe in the days of lasers.

To me, the situation is crystal clear: the problems facing us are *not* with the cataloging rules, which is what RDA/FRBR deal with, but are in many, many other areas.

RE: RDA/ Library Discovery Experience/FRBR

Posting to Autocat

>Kevin Randall wrote:

> If it's still argued that FISO does not apply to what people are doing online, then I guess I feel just like Alice, fallen down through the rabbit hole or through the looking glass.

It isn't that FISO does not apply to what people are doing online, it's just that based on the very existence of Web2.0 tools and their popularity, plus other new internet technologies, other methods are overtaking FISO. Also, I want to make clear that I find these "other methods" to be potentially very dangerous. Let me explain.

When someone is not actively searching out information, but when this information is simply being spoon-fed to them and they can just click on a link, as happens with automatic recommendations, RSS feeds, shared bookmarks, and other similar tools, the person is essentially following roads that others have picked out for them. Therefore, if you are on a left-wing site that hates the Republicans, and you use only their tools, it is very possible you will never see any other side. Or, if you are on a Republican site, you may also never see any other sides. And let's face it, it would be crazy to expect links to the "enemy camp" from either side.

Corporations will present you with information that portrays them in a good light. They will never lead you to sites that say how bad they are, e.g. I haven't looked, but I am sure that BP's site tries to put the best face on everything, and does not give a full picture of the criticism they are undergoing. This shouldn't surprise anyone since it is only how the world works, but in a Web2.0 world, we see other consequences.

Therefore, the danger of Web2.0 tools (at least in my opinion) is that they could lead to the creation of isolated, information islands that consequently leave people in the same isolated positions, getting only information from e.g. FoxNews and sites that agree with them, or only DemocracyNow and sites that agree with them, or specific bloggers that link only to sites they agree with. Of course, this not only occurs with politics, but it could be for science, for all areas of scholarship, each replete with their own feuds, and for all areas of knowledge. Especially today, when people are angry about the economic meltdown, real and true schisms could occur, and in terms of the WWW, the Web2.0 capabilities may make it even worse. Of course, web search engines can be manipulated as well.

So, I am very skeptical of Web2.0 tools but I accept their utility and their popularity. They won't go away, nor should they. But right here I think librarian ethics (!! I know!) could become incredibly important. Here is the ALA code:
http://staging.ala.org/ala/aboutala/offices/oif/statementspols/codeofethics/codeethics.cfm and there are a few that stand out here:

II. We uphold the principles of intellectual freedom and resist all efforts to censor library resources.
III. We protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.
VI. We do not advance private interests at the expense of library users, colleagues, or our employing institutions.
VII. We distinguish between our personal convictions and professional duties and do not allow our personal beliefs to interfere with fair representation of the aims of our institutions or the provision of access to their information resources.

I think these principles can stand as an excellent foil to the potential dangers of Web2.0, and need to be much more seriously considered as we go into the future. They deal with many of the most difficult issues in the development of the web today. I know of no other field that has anything even remotely similar to this, unless of course, you actually believe Google's "Don't be evil" stuff.

Any tools we make must be kept with these principles in mind and then I think the task becomes a bit clearer. For example, to make a tool that gives people an idea of the *wide array* of information on a given topic, and not only present information that we happen to agree with personally or that agrees with the institutions that pays us our salaries. Web2.0 will not do that, but I maintain that people and society need it.

It really is a great, but frightening time, to be a librarian!

RE: RDA and the Library Discovery Experience (Was: What to stop doing)

Posting to Autocat

On Wed, 21 Jul 2010 18:00:46 -0700, Daniel CannCasciato wrote:

>James Weinheimer wrote in part:
>
>" It's amazing that I vociferously state over and over again that *I do not know* what users are doing today, and I refuse to even make a guess as to how these people are interoperating with the information out there. "
>
>and that's the crux of our disagreement. We do know quite a lot about what our patrons are doing. There's all sorts of literature out there indicating not just what they do in searching, but also in interacting with data and each other. So I can't accept that we do not know. Of course we know.

I guess we will just have to agree to disagree. The reason I maintain we do not know what people are doing in this new information environment is that it is changing constantly. There are floods of new networks and information tools introduced constantly. What we may think we know today could be quite different in only two or three years, if not sooner. It is almost impossible to keep up with it all. These new tools are changing how people interact with the information: how they become aware of it (instead of actively seeking it out and finding it), and what they want from it (instead of identify, select, obtain).

Pretending to *know* what people are doing does not seem to me to be a wise course today. To play an important role in all of this, I believe we must keep our minds as open as possible, and this includes our attitudes toward user behavior. For only one example, when the books in Google Book Search become available (not "if", because eventually they will), libraries could wind up looking like those old ghost towns in the West. Will having zillions of worthwhile books--certainly enough to earn a B.A. or even a master's--available at the click of a button have an impact on user behavior? Undoubtedly. We must accept that our users will be quite different then. If we accept this, it only makes sense to assume that they were different 20 years ago.

My concern is: there are all of these networks and tools and we must assume they will become more and more popular. Libraries and their catalog records play *zero* role in them or next to nothing. This is a deeply ominous direction. In any case, librarians will have little impact on how user needs evolve and we need to focus our attention on areas where we can actually have an impact: our records and how they can be accessed.

<snip>
He also wrote: " In fact, I have run across several students who don't even understand what a search by author, title, or subject even means!"

Hey, many of us don't have a clue about electricity but volts, amperes, and grounded or ungrounded have spectacular meaning. So they don't understand author or title. They can learn it. Should they? I believe so.
</snip>

I believe they should learn it too; in fact, there are lots of things I intensely dislike about this new information society, and "lack of information skills" is one of them, but we must accept that not knowing author, title, subject is one of the areas where user needs have changed from 20 years ago and is evidence that FRBR is incorrect.

Not that this is anything new. Talk to reference librarians or read the literature from the past and you will find that the public has always had trouble with these things. There were huge debates over whether to have a unified catalog or a split catalog (all cards in one sequence, or separate author and subject catalogs). This had major consequences for access. Or they would get questions such as "Why does this card have Dostoyevsky's name typed in black at the top, while this other card has it in red?"

Try training members of the public yourself and you will notice a tremendous, unspoken hurdle: user expectations. I think this is something new. People will not automatically accept whatever you say. It's also a *lot* harder to teach this stuff than you may imagine and I suspect it was easier in a print environment to teach people how to use the catalog. But more importantly today, people are always mentally comparing what you are telling them with their favorable experiences in Google and many, if not most, just won't believe you. You must convince them that what you are teaching is *not* ancient history, and that Google *really can't* do
something, that it all really matters and so on. This is what I have found to be truly difficult.

Even in those rare cases when you manage to convince them intellectually somehow, then comes the practical part of demonstrating it and you are stuck with these clunky OPACs, which are nothing more than digitized card catalogs. Very difficult to be convincing.

I think the real problem is that people do not have the slightest understanding of what is controlled vocabulary, how to use it, or how powerful it can be when it is used correctly. In other words, I see the problem as: How do we get the power of the catalog (which is basically, the use of controlled vocabulary) onto the web in a way that is useful to people who are essentially untrained? The current ways do not work and FRBR/RDA do not help.

I think there are ways of doing it, but any steps forward will be based on trial and error. The Semantic Web offers us a rare opportunity, but we must take that opportunity. There are some tools out there where I think we could have a major impact, e.g. dbpedia and https://subj3ct.com/.

In any case, trial and error means that progress can only be made incrementally so we must be understanding and patient when someone shares an attempt and it works only in 30% or 40% of the cases or so. That must be seen as a success, but only a step along a long path.

Wednesday, July 21, 2010

RE: RDA and the Library Discovery Experience (Was: What to stop doing)

On Wed, 21 Jul 2010 09:15:31 -0400, Myers, John F. wrote:

>I fail to see how the perceived lack of FRBR Group entities in Google negates the applicability of the FRBR user tasks. I am going to enter search terms into the Google box, relying on the Google algorithm to FIND a set of "hits" to resources for me; within that set of "hits" I am going to peruse the summaries of the associated resources to IDENTIFY one or more that seem promising; having identified such promising resources, I will further decide to SELECT one or more; and I will lastly click on the link(s) to OBTAIN the selected resources. Granted, in practice, it is more of a FIND and then IDENTIFY-SELECT-OBTAIN division. As the FRBR model is an intellectual construct though, I do not have any qualms with a division of the latter "hyper-task" into distinct constituent intellectual tasks, however closely they may actually take place -- to the point that I am not actually cognizant of them as distinct tasks while performing them.

It's too bad that everyone wants to focus on the users' activity and not on what they are finding. It's amazing that I vociferously state over and over again that I do not know what users are doing today, and I refuse to even make a guess as to how these people are interoperating with the informationout there. Yet, others want to say that I am wrong--that we *know* what our users are doing. Sure, I'll agree that users *may be doing* what FRBR said back in the 1990s, but we can't be sure of it, and in my experience, users are doing something different by interacting with others, clicking on hyperlinks and surfing, sharing recommendations and bookmarks, RSS feeds, embedded documents and everything being sliced and diced to death. I am only sure of one thing: the only people who can possibly have a clue about how people work with information today are the reference librarians and specialist researchers, i.e. experts who work a lot with the public. I have some experience, but not nearly like others have, but I have noted that much of what I have seen others do, and I do myself, does not seem to fit into F/I/S/O. Still, it seems that if you have a big enough hammer you can make almost anything fit into anything else. (Remember the O-ring disaster with the Discovery shuttle?)

What is important with F/I/S/O is to accept that all of that part is out of our control: people will do whatever they do. What is in our control, and what is our task however, is to create something that fulfills their needs. So my concern is with the "what" (i.e. the records we make) that users are interacting with (however they do it) and the "how" they are doing it. As I wrote before: they are interacting with information right now very successfully without any works expressions manifestations or items, and without any authors titles or subjects. I stand by that. In fact, I have run across several students who don't even understand what a search by author, title, or subject even means! All they know is Google and Wikipedia, and they don't have anything like that.

These are the patrons we are getting now, and will probably be a majority in the very near future if they aren't already: they are becoming more and more distant from our tools--tools that, as I pointed out, were designed in the distant past. I say that FRBR and RDA try only extend this past instead of creating something new and useful for modern society. What we make is vital, I still believe it, but what is important is to make others believe it. I don't believe FRBR and RDA can do that.

RE: RDA and the Library Discovery Experience (Was: What to stop doing)

Posting to Autocat

On Tue, 20 Jul 2010 13:31:20 -0600, Truitt, Marc wrote:

>I wonder whether it's possible that we're all expecting too much from FRBR?
>
>I personally found Kevin Randall's real-world example of the FRBR relationships and user tasks both sensible and persuasive.
>
>FRBR is a model, nothing more. It attempts to describe a set of bibliographic relationships and user tasks.

FRBR has taken a tool (the library catalog) that has been created over the centuries through trial and error, then in the 19th century this tool was codified and took its present form, and has remained essentially unchanged even when computerization of the records took place. The fallacy was to use this tool to create a type of idealized model for the 20th-21st century that proclaims: this is what people do when they are searching for information; this is what they look for; and this is how they do it. These assumptions are wrong (although realizing it may have been almost impossible at the time, before the exponential growth and popularity of full-text keyword searches and then with the Web2.0 tools. I confess that when I read FRBR originally, I thought it was correct).

When FRBR claims that people "find/identify/select/obtain" it should not be forgotten that these are all transitive verbs: each one is incomplete and needs an object. Then, the "how" is addressed, for example: find (what?) expression (how?) author, or identify (what?) manifestation (how?) publication date. So, all the user tasks flow from one another logically and cannot be separated. When you do separate them, the whole matter becomes almost senseless, as Mike very humorously pointed out.

The problem is: all of this was taken from Cutter's Rules that came out a long time ago, and of course, he pretty much cribbed his rules from others who went even farther back like Panizzi and Hyde. Yet (and this is my answer to Kevin) millions of times every day, we know that people do not follow the FRBR user tasks to find information that is useful, and that many people (if not the vast majority) prefer using these new tools to our tools. (In fact, we can't even call these tools "new" any longer!) I will venture to suggest that the vast majority of information found today is not found through the FRBR user tasks.

How do we know that people are not following the FRBR user tasks in these other tools such as Google? Because they can't--it is simply not possible. There are no "works" or "expressions" in Google to be found. There is no possibility to search by author or title or subject. (The option to search in the "title" in the Advanced Search, only deals with information in the <head><title> section and is not the traditional idea of a title)

So, it is clear that Google has managed to build a tool that allows people to find information that is useful to them (at least I find it very useful!) and it has nothing at all to do with the FRBR user tasks. The popularity of Google and its like is more than obvious. We ignore this to our own peril. The only conclusion I can draw is that it is fallacious to generalize the FRBR user tasks into the larger world, and what FRBR actually does is to rephrase Cutter (that is, the traditional library catalog) in other terminology.

But doing this has led to some very knotty problems, such as the concept of a separate "work" which then has lots of "attributes". For Cutter and up until FRBR, the work was nothing more than a single collocating place in the catalog where the cards filed (and later, records in the OPAC) and was not all that difficult. Now, in the FRBR model, it turns into a "work record" and takes on a whole load of other theoretical functions, becoming something highly complex. The same happens with expression and so on. To get an idea of the complexity, take a look at the very well-done "Functional Analysis of the MARC 21 Bibliographic and Holdings Formats"
http://www.loc.gov/marc/marc-functional-analysis/functional-analysis.html, in particular see the horrifying Table 3:
http://www.loc.gov/marc/marc-functional-analysis/source/table3.pdf. Just glancing through this table will provide an idea of how complex the work/expression/etc. is.

Adding these levels of complexity would be fine if it added substantially to access, but it does not. There will be exactly the same access as there has always been. Adding these levels of complexity would also be fine if it were demonstrated that this will answer the needs of our users, but it is clear that our users' needs have changed.

What are the users' needs today? I will not pretend to be competent to answer, but there is a lot of research being done by all kinds of experts and all kinds of organizations. Still, it has always been the case that people have found information by talking to one another, and recommending materials, by shelf browsing, by following citations. Anyone who has ever done any reference work with the public knows that the find/identify/select/obtain scenario is far too clean and neat. A type of "exploration" takes place, both in the information resources that the searcher peruses, as well as in the searcher's mind where they are constantly asking themselves: "What do I want? I'm not sure. Maybe this.. no..."

At base, I think the major problem with FRBR was to use later editions of Cutter's "Rules for a Dictionary Catalog" and not the very first edition that came out. (How's this for following FRBR?!) In Cutter's first version (and perhaps some other earlier editions, I don't know), there was a very important section (What Kind of Catalog) where he discussed the reasons for the catalog (Available at: http://tinyurl.com/2wnasmt). He lists a number of common questions asked in a library, and then proceeds to show how they can best be answered. These are the reasons for the catalog that he created and gives a purpose to the later rules and structure.

As a result, the catalog and its structure were very closely allied to the needs of the patrons. This is what I think needs to be done again, and lots of major organizations are doing precisely this, such as Google and LibraryThing and Microsoft and Mendeley and on and on and on. They really care about discovering what people want and need. Yet, libraries remain mired in the 19th century and earlier by insisting that people find/identify/select/obtain works/expressions/manifestations/items by their authors/titles/subjects, when every day all around us we see other methods and we even use them ourselves!

As real-world examples of how people find information today, I would suggest Andrew Abbott's "Library Research and Its Infrastructure in the Twentieth Century" http://hdl.handle.net/2142/14401, and people may be interested in my own experience in a post I made to RDA-L about how I (finally!) found that early edition of Cutter's Rules at:
http://www.mail-archive.com/rda-l@listserv.lac-bac.gc.ca/msg02048.html (The link to the Rules doesn't work there)

This is why we desperately need the input of reference librarians, to more clearly enunciate what our users need, but as I said, there is a lot of research out there already. FRBR and its associated RDA are just a far more complicated way to do what we are doing now, and we need other changes that more closely correspond to what our users need.

Apologies for the length of the post.

Tuesday, July 20, 2010

RDA and the Library Discovery Experience (Was: What to stop doing)

Posting to Autocat

On Mon, 19 Jul 2010 14:05:54 -0700, Steven C Shadle wrote:

>As for Jim's other points, I more or less agree with them (although I think I'm a little more hopeful that RDA can bring a little more order to the library discovery experience). --Steve

While I think that hope can be a good thing, so long as it is not taken to an extreme, hope can also take the form of wishing for a "deus ex machina": that somehow against all reason, something will just happen and all will be for the best. Unfortunately (and with all due respect, because I have great respect) I think this is what hopes for RDA happen to be.

No one has shown how RDA will help us produce quality records more quickly or more easily; no one has shown why other metadata communities will want to create RDA-enabled metadata although they don't want to provide us with an AACR2 equivalent now; no one has shown why our users will suddenly find what they want in library catalogs and come rushing back to us. We are just told, and we are supposed to just believe, that FRBR and RDA are the wave of the future.

I can't see how the training of new catalogers will get any easier; there are no new access points foreseen with FRBR/RDA. For example, I remember a rather recent exchange on the RDA list (I believe) concerning how to catalog treaties, and the mind-bending contortions to shoe-horn it into RDA, while the result was that the access points were *exactly the same* as they are now!

Why do it? Because it's the wave of the future.

FRBR and RDA offer exactly the same user experience as people get in the library catalog now, *except* they will get the FRBR displays that merge the works/expressions/manifestations/items together, but I think I have shown that this is what the old printed catalogs of the 19th century (and earlier) did. While I am sure there are better examples, here is one in the (excellent!) catalog "Index to the catalogue of books in the Upper hall of the Public Library of the city of Boston" made by Jillson and Vinton. Take a look under the listing they made for Samuel Johnson's books: http://tinyurl.com/2wrbta3.

Compare this to these FRBR examples: http://www.loc.gov/marc/marc-functional-analysis/multiple-versions.html#displays and you will see how eerily similar they are.

While I confess that FRBR appeals to the historian side of my nature, my everyday, practical librarian side realizes this *will not help* any of my users, who have terrible problems with the library catalog. Just a few minutes' work with these people show that their problems are completely different from what RDA and FRBR address. And let's face it: would these displays really help anyone on this list in their own information needs? (Not their *cataloging* needs, but their *personal information* needs)

Do you really "find/identify/select/obtain": "works/expressions/manifestations/items" by their "authors/titles/subjects"? Or do you do something else? Personally speaking, very occasionally I actually do follow the FRBR user tasks, but I suspect that is because I am a librarian and a cataloger. Mostly I do something else. My users do something else. (This is a totally different topic)

The 19th century is truly over, and we must move on. When people have the option and the apparent "ease" (I put this in quotes on purpose) of searching full-text and relevance ranking, FRBR displays just do not cut it. And this means RDA. I firmly believe that the records we make are vital in today's world, but many assumptions must be rethought completely.

It brings me no joy to point out these issues, but I really believe they are true and they must be addressed before the libraries make the decision to commit major resources to creating what seems to be the equivalent of a typewriter in the age of word processors and laser printers.

Monday, July 19, 2010

RE: What to stop doing

Posting to Autocat

On Sun, 18 Jul 2010 09:28:45 -0400, Amy Turner wrote:

>Sally, you wrote in 2006 about what we NEED to stop doing--maintaining duplicate records in multiple local catalogs. WorldCat Local shows that this is possible, but I'm not quite ready to jump on board because of limitations in OCLC capabilities for automated authority control. But, I think we will eventually look back in amazement at discussions of maintaining local subsets of the same basic body of work.
>
>Meanwhile, it is really hard to stop doing anything major, in spite of the pressures you mention. At Duke, we have a goal to spend less time on print to make more time for electronic resources, and I'm not sure how we will accomplish this. Shelf-ready books will help, and I have proposed that we class all monographic series together (historic classed together series are about our last local practice). We have already streamlined and streamlined ...

This is a good point, but I don't know if sharing a single master record (ala OCLC WorldCat Local) is in everyone's best interests, although it may be the only practical option. The problem is: for this to work, we need to have real, genuine *standards* that everyone *must follow* or be penalized. This is how standards work in other fields, e.g. plumbing, roofing, automobiles, where if something doesn't follow the standard, the company is liable and the individual tradesman is at risk.

As I have written in other posts, standards in cataloging don't work this way and never have. If there is a subquality record a cataloger takes it and fixes it up locally but nothing much happens to whoever made the record or the institution responsible, except people say bad things at other institutions.

If we are stuck with a single record as in WorldCat Local, it becomes difficult to fix it up locally. Records are made and updated by anybody, from student to master cataloger. I realize that there is now a provision that makes it easier to change the master record, but this could lead to just as many problems as it does improvements, since the updaters will still be anybody, from student to master cataloger.

For something like this to exist, I think there is little choice but to insist on some sort of certification, as happens in most other endeavors. To be an electrician, you must demonstrate and maintain your skills, and if you do a bad job, you and the company that employs you, are damaged.

Also, how does this figure in to the WorldCat Record Use Policy? Now that the books are in Google books, it seems that libraries really would lose ownership of the last resource they own, their catalogs.

Thursday, July 15, 2010

RE: [RDA-L] Consolidated ISBD and RDA double punctuation

Posting to RDA-L

Bernhard Eversberg wrote:

<snip>
That would mean back to the drawing board, and is unrealistic.

One has to ask, however, what the future role of ISBD can or should be.

The difficulty of harmonizing ISBD with RDA and MARC results from the
fact that ISBD has no clearly defined element set. Every area has
a "Contents" section listing the elements of which it consists,
but these elements can consist of smaller parts or repeting sub-elements
which are introduced only in the text and examples. In the glossary,
one may even find elements ("Avant-titre") not referred to in the text.
</snip>

Well, in one sense there is a rather clearly defined data set. It's just that it's defined a bit differently.

I shall make up the following coding:
<isbd record>
<area type="1">
Voina i mir
<code type=" = ">
War and peace
</code>
<code type=" : ">
a dual language edition
</code>
<code type=" / ">
by Leo Tolstoy
</code>
<code type=" ; ">
translated by Joe Smith
</code>
<code type=" [] ">
electronic resource
</code>
</area1>
</isbd record>

Any ISBD cataloger in the world immediately understands this. That is quite a feat, and should be built on. I think it is what ISBD really wants to do and shows clearly the huge advantage of not using words, but codes that are understood by everyone.

My own opinion is that ISBD, instead of focusing on punctuation, should be establishing standards for guaranteeing record transfer using XML, a type of "exchange format" where someone can transfer an ISBD record (or the part of the record following ISBD) from any library catalog into any other library catalog (and potentially any other service that wants to use it). In the past, it was achieved with punctuation and placement on the card; in today's environment, there are different methods (not necessarily "better" methods but different).

This format *could* still use the punctuation, as I have shown, but this is rather bizarre and would probably need to change somehow. Naturally, it does not need to render on the screen for everyone this way, since each database manager could display " = " however he or she wanted.

Tuesday, July 13, 2010

RE: More verbs. Electronic 'Items' (Yes, another FRBR thread)

Posting to open-bibliography

Dan Matei wrote:

<snip>

> For these sorts of reasons, I personally have major theoretical problems with the concept of "manifestation" especially with practical effects when applied generally. I see the "manifestation" primarily as a throwback to the catalog/unit card, and I think there are far better ways of handling them with modern tools.

Name them for this case, please.
</snip>

I believe that the problem lies with the very concept of the "manifestation" which I have tried to show here and in other places, that it has always been merely a matter of *definition* and not a matter of any fact. There has always been this dichotomy in cataloging anyway, e.g. to use FRBR terminology:

Work/Expression (are abstractions. I cannot point to the *work* "Crime and Punishment* or any "Expression")
vs.
Manifestation/Item (are physical things. I can point to an Item containing "Crime and Punishment" and these items are combined into the "Manifestation")

Library practice *assumes* that specific changes in the physical item means a change in the abstraction (which has always been known to be incorrect, but otherwise, the task was impossible). What are these specific changes? What I mentioned before: changes in title page transcription, dates, and so on. Different organizations have different definitions for this, and even in AACR2/LCRI cataloging, there was a major change in recording paging practices (we stopped counting plates in many cases) so therefore, one day a book with plates was considered a different manifestation, but after this new directive, they suddenly become copies. (I remember how horrified I was! :-) ) As I have tried to show with the LCRI 1.0 and the ALA guidelines (plus many others out there) determining if something is a new edition or copy (or in FRBR terms manifestation or item) is both difficult to learn and to do, plus it is semi-capricious.

So, this led me to consider if there really is such a thing as a "manifestation" and if not, what is it? I have decided that the manifestation is a continuation of the hand-made catalog/unit card, which was used to summarize a collection's holdings. In reality, the "manifestation" is nothing more than a *group display of the items*. There is no single way of defining the group, and there are additional problems with manually determining manifestations, and can be very difficult to teach.

[Since I am the historian, I want to point out that in the past, there was a lot of interest to keep the number of cards to a minimum, for various reasons, and often, there were notes leading the searcher to the "main entry card" where other editions etc. could be found. I tried to find a good example in Princeton's scanned catalog, but could only find this: http://tinyurl.com/25ms8u3 where it says, For more information see main card.]

My question is: if the manifestation is only a matter of definitions (IF title, publication information, dates etc. are all the same, then it is a duplicate; or if the date is within x number of years, it is a copy, and so on and so on) it seems as if this would be a perfect candidate for automatic sorting and display. If people (or computers) create the metadata record, they would always copy *exactly what they see* and instead of puzzling out which "manifestation" this item belongs to manually, let the machine sort out the displays.

What would this mean in reality: most information currently in the manifestation would go to the item, and then when processing the item (using xml, rdf, RDBMS or whatever) any information that is the same as in another item would be replaced with a URI to that information. As far as displays of the "group of items" goes, that could be left to the discretion of each database manager.

I think instituting something like this would make it far easier both for creating records and training, while everything would be much more accurate than what we have now.

But again, I don't think anything will change on this.

RE: More verbs. Electronic 'Items' (Yes, another FRBR thread)

Posting to open-bibliography

Graham,

It looks like a wonderful project you are working on and should help a lot of people. And it's very interesting that we are seeing a return to the old scribal traditions of the pre-printing days, a tradition that has been forgotten, where today we take for granted that with additional editions (1st, 2nd, 3rd etc. editions) we end up with successive *improvements* of the text, we should remember that this idea of "improvement" has only been with us since the introduction of printing. Before printing, it was entirely turned around: the idea that the farther away from the original text we were, the more *corrupt* the text becomes, because of all the errors in hand copying. Therefore, in a world with all manuscripts the task was to recreate the original version. Amazing how that is being transferred to the situation of today!

My own opinion is that OPMV http://open-biomed.sourceforge.net/opmv/ns.html is a step in the right direction, and there are places to describe where, when and how changes took place and who did them, I see no provision there for detailing *what those changes were*, which is what people really want. To get an idea of a traditional manuscript collation, there is a good discussion at: http://www.skypoint.com/members/waltzmn/Collations.html with good examples, http://www.skypoint.com/members/waltzmn/Collations.html#Samples. This same basic method is used for early printed books as well although they also consider how the book was put together.

The traditional library determinations of an edition/manifestation that I pointed out before are based much more on changes in the physical item than in the changes in the text, i.e. if the transcription of the title page, formal edition statement, dates (within certain limitations), physical paging, and series statement is all the same, it is *assumed* to be the same edition/manifestation and is therefore handled as an item. But of course, the text inside could be slightly different or even completely different because librarians do not have the time to compare texts so thoroughly. The opposite is the same as well: if something on the title page, dates, etc. is *different* it is considered a new edition even though the text may be completely the same. (This has resulted several scams by unscrupulous publishers, by the way; plus it happens more honestly with US vs. UK publications) In librarian terminology, this is called "content vs. carrier". Library tradition, under pressure of productivity, has almost always concentrated much more on carrier.

Naturally, the traditional collation methods of manuscripts cannot be used on web materials and I think the librarian emphasis on carrier also does not serve well in a digital environment. Still, the final product of a manuscript collation can be pretty nice, since it details the changes very clearly. Modern tools can recreate these things automatically, e.g. in the Wikipedia History pages http://tinyurl.com/33khhjj where you can select any versions you want and the changes are displayed very clearly.

In your case, could you do something similar to the Wikipedia history page by doing file compares or something?

Monday, July 12, 2010

RE: More verbs. Electronic 'Items' (Yes, another FRBR thread)

Posting to open-bibliography

Cataloging is just one use of bibliographic data. Citations are another important use. While different formats (e.g. Postscript & PDF) should nominally be identical, there's no guarantee, so I'd argue that each different format should be indexed as a separate manifestation (or equivalent in the schema of your choice).

But this becomes a problem for comprehensibility and means a lot of work. My example was always this page of Thomas a Kempis' Imitation of Christ:
http://www.ccel.org/ccel/kempis/imitation.html. Is this one "thing" or several? (This site has actually gotten rid of several formats over the years, compare http://web.archive.org/web/20030924082152/http://www.ccel.org/ccel/kempis/imitation.html)

Also, with materials in sites such as the Internet Archive, how do we deal with them:
http://www.archive.org/details/imitationofchris00newy

Is this 8 manifestations? That's a lot of extra records and work. Is it best for the readers and is it worthwhile to deal with them this way? And what happens when the Archive decides to automatically create, let's say XML versions for every document. Are these all going to be different manifestations?

For these sorts of reasons, I personally have major theoretical problems with the concept of "manifestation" especially with practical effects when applied generally. I see the "manifestation" primarily as a throwback to the catalog/unit card, and I think there are far better ways of handling them with modern tools.

But I realize we are probably stuck with what we have.

RE: More verbs. Electronic 'Items' (Yes, another FRBR thread)

Posting to open-bibliography in answer to a question about what constitutes an FRBR "manifestation"

The guidelines followed by the Library of Congress, and therefore, by most other Anglo-American libraries are in "LC Rule Interpretation 1.0 Decisions Before Cataloging", which can be found at the Cooperative Cataloging Rules Wiki at:

http://sites.google.com/site/opencatalogingrules/aacr2-chapter-1/1-0--decisions-before-cataloging---rev

Even before you begin to do anything, you must ask whether the specific item you are working with is a copy of something already in the catalog, or something new. That rather simple task is actually very complex.

They ask two apparently simple questions:

Before creating a bibliographic record, determine what is being cataloged. Answer these two questions:
1) What aspect of the bibliographic resource will the bibliographic record represent?
2) What is the type of issuance of that aspect?

Figuring out the answers takes you down some interesting paths. At the end of the Rule Interpretation, you will see some interesting guidelines that say what differences are allowed between one "manifestation" and another, e.g. different ISBNs do *not* have different manifestations.

I think it is important to realize that there is a great deal of variation between different agencies (publishers, libraries, other organizations) concerning what determines a manifestation and many libraries have their own interpretations as well. (For example, is a photocopy a new item or a new manifestation? Different libraries do different things) In any case, each field and each institution has different needs and I do not see agreement on any of this whether FRBR and/or RDA is accepted or not.




Message from the next day

I realize I forgot to mention "Differences Between, Changes Within: Guidelines on When to Create a New Record" from ALA (http://www.acrl.org/ala/mgrps/divs/alcts/resources/org/cat/differences.cfm), which also differs from the LC practice.

As a result from this multiplicity of guidelines which allow (or do not allow) for all kinds of variant, it is difficult to know precisely what information a specific resource has: what is the ISBN on the item, the number of pages, the precise dates on the item, and so on.

Friday, July 9, 2010

RE: Copernicus, Cataloging, and the Chairs on the Titanic, Part 1 [Long Post]

Posting on NGC4LIB

Jimmy Ghaphery wrote:

<snip>
It's a tough sell, that only a deep researcher might appreciate
http://www.google.com/search?q=caravaggio+paintings
vs
http://tinyurl.com/39o77fr

I do hear you though Jim. Google is a commercial entity despite the do no evil mantra. I actually think we are barking up the tree that Alex was tossing our way. I too would love to see a high quality unbiased and ethical library search that rivals the features of commercial search. I just don't see us as terribly close, which is especially frustrating as I too believe the information and services we provide are vital.
</snip>

I agree: it is a *very* tough sell and my concern is that we are doing nothing about it, except for RDA, which--let's face it--will cost a lot in time and money, and won't change anything at all for our users.

<snip>
The following article was very eye opening to me on the evolution of commercial search in terms of what is under the hood and constantly changing:
"On most Google queries, you're actually in multiple control or experimental groups simultaneously,"
http://www.wired.com/magazine/2010/02/ff_google_algorithm/all/1
</snip>

Thank you for this very interesting article. A few quotes caught my eye:

"The data people generate when they search — what results they click on, what words they replace in the query when they're unsatisfied, how their queries match with their physical locations — turns out to be an invaluable resource in discovering new signals and improving the relevance of results. The most direct example of this process is what Google calls personalized search — a feature that uses someone's search history and location as signals to determine what kind of results they'll find useful."

"Throughout its history, Google has devised ways of adding more signals, all without disrupting its users' core experience. Every couple of years there's a major change in the system — sort of equivalent to a new version of Windows — that's a big deal in Mountain View but not discussed publicly. "Our job is to basically change the engines on a plane that is flying at 1,000 kilometers an hour, 30,000 feet above Earth," Singhal says."

"That same year, an engineer named Krishna Bharat, figuring that links from recognized authorities should carry more weight, devised a powerful signal that confers extra credibility to references from experts' sites."

A lot of this really bothers me and reminds me of the famous Bismarck (apparently apocryphal) quote about laws and sausages: "Laws are like sausages. You should never watch them being made." I guess search engine algorithms are of the same type!

These people, who seem to be engineers and I am sure mean well, are wielding incredible power and potentially, they could have even more. For example, who decides who is a "recognized authority"? And location helps to determine your information needs? Like, you are in the poor part of a town vs. a rich part of a town? Or a poor country vs. a rich country? And I found it very interesting they they do these *incredible* changes to the information that is served up to you, but they work very hard to make sure you don't notice it.

Definitely food for thought and consideration.

RE: Copernicus, Cataloging, and the Chairs on the Titanic, Part 1 [Long Post]

Posting to NGC4LIB

Jimmy Ghaphery wrote:

<snip>
One of the initial points of this thread had the audacity to say: "Stop Bashing Google." I think there is a middle ground between blindly
accepting Google as perfect and treating it as useless.

When I search for "Caravaggio paintings" (without quotes), I am not aware of any library catalog that does as a good of a job with this
search query.

I get a nice cluster of images, top three links are not bad at all...
caravaggio.com
en.wikipedia.org/wiki/Caravaggio
www.wga.hu/html/c/caravagg/index.html

Latest news on Caravaggio, current and from respected newspapers.
...
</snip>

Perhaps I need to clarify: what I wrote (or at least meant) was that, as the people who make websites learn ever more subtle methods to manipulate Google results to their own advantage--as is happening right now, with entire businesses created for this purpose (see the Google search for Search Engine Optimization http://tinyurl.com/37kfmj6), the searcher for "Caravaggio paintings" will get results that are *increasingly* useless, i.e. search results that actually serve the purposes of *those who wish to manipulate the results* you see vs. what is really and truly relevant for your informational needs.

While the results in Google may be interesting and useful, we must contrast then with a search on a subject in a library catalog, e.g. browse search for subject "Caravaggio, Michelangelo Merisi da, 1573-1610." http://tinyurl.com/39o77fr

I won't say that one result is necessarily *better* than the other, but each is different and useful. One however, is definitely more "unbiased and ethical" and has the advantage of allowing searchers to not worry about some unscrupulous cataloger trying to get as many dupes as possible out there to open their wallets or to twist people's minds in some way, which happens all the time in Google, although very few people realize it. On the other hand, it has the normal problems of anything created by humans, and of course, it is unrealistic to expect people to do browse searches in this way any longer.

What we are providing is useful and I believe, vital. It doesn't mean the other ways are no good, but somehow we need to figure how to bring out the power of each.

Wednesday, July 7, 2010

RE: Copernicus, Cataloging, and the Chairs on the Titanic, Part 1 [Long Post]

Posting to NGC4LIB

Bernhard Eversberg wrote:

<snip>
Would libraries not be marginalized very quickly if all materials that exist in digital formats were to be set free? In that unlikely
event, local collections lose most, if not all, of their appeal, except as repositories of physical objects some people might want
to inspect as such and physically.
</snip>

and

<snip>
And as soon as everything is open and accessible, the need for selection goes away. Rather, it becomes a "user task", as FRBR
in its sublime wisdom has already pinned down.
</snip>

That's an interesting subtlety on the FRBR task of "select" that I hadn't considered. I don't know if that's what the originators intended(!), but...

I don't know if it's really correct that the need for selection goes away when everything is open and accessible. Definitely, it changes dramatically, but as I read in a recent report of the Research Information Network "If you build it, will they come? How researchers perceive and use web 2.0" (still reading it) at http://www.rin.ac.uk/web-20-researchers, there is a quote on pdf p. 7 under Barriers and constraints:

"But a second major set of barriers revolve around perceptions of quality and trust. Both as producers and consumers of information, researchers seek assurances of quality; and many of them are discouraged from making use of new forms of scholarly communications because they do not trust what has not been subject to formal peer review. A significant minority of researchers believe that peer review in its current forms will become increasingly unsustainable over the next five years, and nearly half (47%) expect that it will be complemented by citation and usage statistics, and user ratings and comments. But at present they do not see such measures as an adequate substitute for peer review. "

This is no surprise, and I don't think I need to prove that when they have the choice, people will opt for "quality" information over "no-quality" information (i.e. no one will choose information that is considered to be lousy over information that is considered to be good--if both are equally accessible). This issue of "quality" is a major obstacle for many using the web today. I have found that often researchers are reluctant to place their materials into an open archive, even though they make no money at all publishing through a traditional publisher, because they are worried that people will label their work "inferior". Scientists and physicists have pretty much gotten beyond this concern.

While the definition of "quality" will change, probably as alternatives to traditional peer review prove themselves, I think people will always want it. In some shape or form this will be one of the tasks of "selection" in the future. Another aspect of selection that I predict will probably arise will be "appropriateness" e.g. the search for "Michelangelo frescos" should have filters for texts appropriate for children, novices, adults, experts, and so on. This would probably come the closest to traditional library selection, since you would be doing this with your "user community" in mind.

Copernicus, Cataloging, and the Chairs on the Titanic, Part 1 [Long Post]

Posting to NGC4LIB

First, I agree with many things you say.

Alexander Johannesen wrote:

<snip>

Still, it has always been my opinion that the main part of a library that determines its worth is its selection.

Then I think you're in bigger trouble than you think. The only libraries that comes well out of this are those where researches and historians hang out. Maybe that's a worthy consolation price, though. But the library as a public service to the masses won't hold up to selection; it is this *very* selection that makes people go straight for Amazon.com instead of putting themselves on the 3 month waitingqueue for some popular and selected book.
</snip>

My own experience proves otherwise. I think that people are realizing that when they type in "Caravaggio paintings" into Google, the results are essentially useless. I meet this all the time as students come to me in a total panic and need help *now*. As the public learns more and more about how to manipulate Google results to their own advantage, and the searcher for "Caravaggio paintings" sees, e.g.

BUY MY CARAVAGGIO POSTERS!
BUY *MY* CARAVAGGIO POSTERS!
BUY **MY** CARAVAGGIO POSTERS!!
BUY ***MY*** CARAVAGGIO POSTERS!!!
BUY ****MY**** CARAVAGGIO POSTERS!!!!
...

the Google results will become increasingly useless. In fact, I think Google itself has agreed to this implicitly by implementing the left-hand options for revising the search, including tools such as that eerie Wonder Wheel that hasn't helped me at all! In any case, I can rework the results somewhat more, but for those who understand the traditional methods, these options are still far too elementary. Adding an option to filter for "quality"--however that would come to be defined--would be far more important than, e.g. Sites with images.

These are the sorts of things that could begin to turn Google from a popularity machine good for getting a sense of a "cultural moment in time", into something more durable and ultimately more useful for serious purposes. Yet, this would be reimagination on a huge scale. (I still believe a promising method would be through innovative browser plugins)

And yes, I think librarians have a huge role, and probably a dominant role, to play. Will it happen? Not so long as they spend their efforts and resources on projects such as RDA which will force catalogers to change their tools and learn how to use them (a tremendous undertaking by the way), while changing nothing at all for our users(!!).

One final note:

<snip>
The only libraries that comes well out of this are those where researchers and historians hang out.
</snip>

Some of my favorite libraries have been the smallest, so long as they are well selected. For example, that is when I began to experience the real power of the classification, which I discovered could actually provoke me in all kinds of ways and led me to some wonderful books I would never have even considered reading before.

RE: Barabbas

Posting to Autocat

On Wed, 7 Jul 2010 00:57:57 -0400, Hal Cain wrote:

>At this point I'm compelled to ask a question I've asked before: why is it regarded as satisfactory to differentiate names simply by adding more terms to one of them? The naive reader sees no reason to suppose that the names "Barabbas" and "Barabbas (Biblical figure)" denote different persons. And, to take my favorite example, it is by no means obvious that "Dods, Marcus" [who lived 1918-1984] is different from "Dods, Marcus, 1786-1838" or "Dods, Marcus, 1834-1909" or "Dods, Marcus, 1874-1935".

This is a good point and in my opinion, is yet another example of something that made much more sense in the card/printed catalog than it does in the OPAC, where by browsing the cards (of necessity!), the user would have seen the records in a type of subarrangement:
Dods, Marcus
Dods, Marcus, 1786-1838
Dods, Marcus, 1834-1909
Dods, Marcus, 1874-1935

In today's keyword environment, browsing has been abandoned by the public so all of this structure has broken down.

This is why I believe that the Wikipedia disambiguation page is much more understandable both to users *and* to catalogers than the traditional methods: http://en.wikipedia.org/wiki/Marcus_Dods or, to take a really difficult name: http://en.wikipedia.org/wiki/David_Johnson

I think the Wikipedia page is clearly better.

>And I see no likelihood that the onset of RDA will change this kind of absurdity, which fosters confusion among our users, and causes ceaseless waste of time to cataloguers who could be doing something more productive!

Completely agreed, but sadly, this waste of time will not only be felt here. RDA, and FRBR, are still based on assumptions that have grown obsolete, and are becoming more obsolete by the day.

Tuesday, July 6, 2010

Copernicus, Cataloging, and the Chairs on the Titanic, Part 1 [Long Post]

Posting to NGC4LIB

Alexander Johannesen wrote:
<snip>
Well, I've just been told by the tone police that my mail was too harsh for those tender librarian ears (I'm boggling over that one to be honest, but who's to argue with authority ...), so this will probably be my last email, at least in a while ;
</snip>

I, for one, hope you will reconsider. I have been the recipient of a few of your salvos but things have worked out. We really need voices that are sincere and informed.

Anyway, I think we more or less agree, except that you think selection is just too overwhelming a task, while I think that updating our current methods and changing our "Weltanschauung" (no equivalent in English for that) to include all related fields, would increase our total productivity in an exponential way. I would like to see what my colleagues and I are capable of. As I said in a previous post somewhere, the traditional task of selection was one of *inclusion* i.e. I include this specific item into my collection because my users will find it worthwhile. In the new environment, I think selection will become one of *exclusion* i.e. saying that this stuff is not worthwhile to my users.

The concept of "my users" also needs to be seen in another way, i.e. not only those in my own institution, but to grow and include constituencies literally all over the world. So, if the correct system were built, when I select an item, I would select it for *the entire world* and when it was cataloged, it would be cataloged for *the entire world*. Could this be done? Technically yes. On this list, I think we know this. To do it socially is a completely different matter. By the way, do not underestimate the huge advance the technology represents within just the last 20 or so years. I personally think many of the social obstacles in the way of realizing something like this is based on the fact that people have not caught up socially with the technology and they really do not understand the capabilities.

All of this would have a profound impact on our theories and not least of all, our organizational units, which answer to our local bureaucracies.

I only want to add however:

<snip>
I'm sorry, but by that definition there is no selection. A selection is always a selection from a bigger pool, and as such there *is* filtering, censorship, style / subject / picture police, academic intolerance, short-sightedness, ignorance, stupidity, all depending on who and what.

> I think this shows the difference rather clearly between
> the attitudes of a librarian and those of a faculty member!

Yes, in this anecdotal story where there is a clear villain. :) I hate to tell you this, but librarians are people, too. I've heard a story of a librarian at a general public library who didn't like glossy pages, and as such no books with glossy pages ("dang new fashionable waste of paper, impossible to read!") were ever selected for this library. I'm sure there's many, many stories lurking around in library corridors.
</snip>

Of course, there are lots of stories like this. But it must be stated forthrightly that these people are not doing their jobs correctly. The reasons may range from simply poor training to out-and-out censorship and bigotry. Yet, this shouldn't be surprising: we have been witness to some rather unethical behavior in the business community in the last few years; there are lawyers and doctors and mechanics and--yes--even computer experts (Bill Gates?) who do their jobs incorrectly. :-)

Still, it has always been my opinion that the main part of a library that determines its worth is its selection. You can have libraries with tiny budgets that are very well selected, and wealthy libraries with very poor selection, filled with materials that people will never use. Sometimes there may be a major collection that eventually ceases to be of any use (I am thinking of some 19th-century libraries that specialized in "phrenology"). Plus, someone may have the best, most consistent catalog ever created, and it remains useless if it describes materials nobody wants.

<snip>
I don't think you can do it. And it breaks my heart.
</snip>

How can selection be done? I don't know, but I at least have some ideas. I would love to see my field put out its finest effort before it decides that it is better to ride out into the sunset. But I may agree with you here. I don't know if the field can change that much.

In Defense of the Memory Theater and Information Literacy

Since this does not fit into any of the lists where I participate, this is my first independent blog post.

I would like to bring to everyone's attention an interesting essay: In Defense of the Memory Theater by Nathan Schneider at
http://www.openlettersmonthly.com/in-defense-of-the-memory-theater/

In part of the essay, the author explains how much he loves books and describes his fears of going virtual, bringing up the burning of the library at Alexandria. But more interesting for me was when he mentioned how traumatic it was when he left school and he lost access to all of those wonderful resources:
"But eventually, inevitably, I moved on from the plenty of universities to a string of tiny New York apartments. My little library came with me. In the months that followed, after a countdown of email warnings, my off-campus access to the University of California’s online databases went dead. By then I had already learned that, as sprawling as the New York library systems were, they couldn’t satisfy me like the academic ones had before. Getting there took not just a stop on the way to class, but a subway ride and a trudge through the cold. Most of what I wanted, anyhow, was in the closed stacks at 42nd Street, and I couldn’t take anything there home with me past the watchful guard of the lions out front.

"It was, finally, just me and my bookshelf. At first it wasn’t even a shelf at all, but piles of books scattered around my room on the floor, as orderly as I could manage and as high as they’d get before tumbling...."
Later, he says:
"...for the several years since I lost my borrowing privileges from research libraries and have had to leave my source texts behind, I’ve come to rely on Google and Amazon searchable previews"
This is very similar to my own experience after I left the magnificent collection at Princeton. (I wrote about this in a previous post, now available on my blog at http://catalogingmatters.blogspot.com/2010/03/observations-of-bookman-on-his-initial.html)

Librarians don't seem to want to talk much about this. In our information literacy classes, we tell people about all of these great databases they can access, all of the great books they can get to, and then in the catalog we concentrate on precisely those same materials, but later when those people leave, they don't have access to these materials any longer. What are they supposed to do? Certainly, if they are aware of it, they may be able to get things through ILL, but while librarians consider ILL a success, I suspect most of our patrons view ILLs a bit differently: although the patrons may have a high satisfaction with the ILL service a library provides, how does this impact on how the patron thinks about the local collection? Do they believe that having to request something that is not available locally and wait for it to arrive evidence that *the local library* itself has failed? From a very quick view of blog posts, I think this may be true. In either case, no matter how well ILL is instituted, it is a pain, and becomes more of a pain especially as the age of "instant-gratification" becomes more and more instant.

So, we teach our patrons to use materials that will probably not be available to them when they leave and go out into the real world. They discover that they are practically helpless without access to JSTOR or Lexis-Nexis. Therefore, if they are not to simply go without, they must use what they find available to them, and while in a previous era, not that long ago, they may have found themselves leafing through 30 year old textbooks or outdated almanacs found on the library's shelves, today they have the Internet and many of the free materials there. It is only natural that they will use and rely on them for their information needs because they won't have anything else, even though they will remember how we told them not to believe what they see on the web. Nevertheless, they will have no choice except to do it, and the final result will probably be only an additional load of guilt added onto their backs.

So in many ways, I think our information literacy classes have results such as teaching a driving class which would train people only how to drive Ferraris or Lamborghinis. 99% of the students will never have access to those kinds of cars again once they leave the class.

Is there anything we can do?

Copernicus, Cataloging, and the Chairs on the Titanic, Part 1 [Long Post]

Posting to NGC4LIB

Alexander Johannesen wrote:

<snip>
Jim, give up *now*. There is no way you even have the slightest time to even look at a squirt of what's available. You simply cannot go through it all; not only is it too much (the 2003 estimates are no good; "According to an IBM study, by 2010, the amount of digital information in the world will double every 11 hours.";
http://news.cnet.com/2100-7345_3-6159025.html?part=rss&tag=2547-1_3-0-20&subj=news),
but it is *ever* changing! The Internet doesn't stand still. Even WikiPedia pages change, not only in content, but in links and in meaning (as content change). Resources and content die and gets born in a continuous line that will never end. There's no way for you to go through it all, no way to monitor it, no way to catalog it ... you cannot put it on a shelf. Of course, you can make a copy and catalog the copy, and as such make it obsolete like old books, that's fine, I'm sure you can do that for a selection of sorts. But the sher amount of stuff you have to wade through to even make that selection is simply unsurmountable.
</snip>

It would be the easiest thing in the world to give up, but one thing I have seen that all of my users want--from students to researchers--much more than our cataloging which they find weird, is selection. (That's why I started the thread) What library selection means in the popular mind is quite different from what it really is. The public has always believed that libraries selected materials because they were the "best" and the most "correct" but that is untrue. Library selectors are only human, and the role of the selector is absolutely *not* to mold the library's collection into a mirror of the selector's own opinions and tastes but instead to help to show people, so much as possible, the range of information that is available.

As an example, I heard a story, perhaps apocryphal but neverthless enlightening, about a great (unnamed) library collection of Russian literature at a great (unnamed) university offering a doctorate in Russian literature. The person who did the selecting for the library in Russian literature was not a librarian, but actually a great (unnamed) Russian writer and faculty member. It turned out that this great Russian writer hated Dostoyevsky with every drop of (his or her) being and therefore, refused to mention Dostoyevsky in any literature classes, but also refused to purchase anything by or about Dostoyevsky for the library. When this great Russian writer passed away, a librarian was set in place who had to begin to build the collection on Dostoyevsky because, after all, how could an important collection of Russian literature have almost nothing on Dostoyevsky? I think this shows the difference rather clearly between the attitudes of a librarian and those of a faculty member. Faculty members can both have and teach personal opinions about anything they want--after all, that is an important part of their jobs and is vital for academic freedom. But a librarian definitely has other goals. Both are needed but they are quite different.

Therefore, when I am selecting, I must add materials to the collection that I do not agree with, even adding opinions that I violently oppose. My opinions should not get in the way of other people before they form their own opinions. (I wrote a page in my library's information wiki about this http://aurlibrary.wetpaint.com/page/What+is+a+Library%3F)

I agree that selection as it has traditionally been done must change--somehow--for materials on the web. And we *must* face it: selection is happening right now, only it is done automatically through Google's spiders (which do not get everything) and their page ranking algorithm, the details of both are quite secret. After all, if something is #500 in a search result, it may as well not exist. Businesses and other organizations understand this very well now and realize that their goods & services must rise to the top, otherwise they die. Therefore we have a strange situation: selection is being done using automated means by a very secretive company (Google), and their "selection policies" (here I am thinking of page rank) can and are being manipulated to serve the private agendas of all kinds of other individuals and organizations. For a quick overview, see http://en.wikipedia.org/wiki/Search_engine_optimization. This are merely statements of fact.

Again, is the job too big as you suggest? So long as librarians remain mired in 19th and early 20th century thinking and processes, it definitely is. I think selection must change from the traditional methods I mentioned above (i.e. a sense of disinterestedness), not necessarily into that of "what is best" but one that strives to provide alternative opinions on a topic: pro/con, left/right, fascist/anarchist, technical/humanistic, or whatever. We could leave to various types of crowdsourcing the task of "what is best". Such a tool, no matter what it is and perhaps only an addon to Google, would be the "catalog". But no matter what the catalog would be, it means little without the concept of "selection".

In short, selection of materials on the web *is* being done today. I am asking if these rather bizarre, secretive methods of selection are best for us. If we do not deal with the problem of selection, we leave it to some of the most unscrupulous characters in the world who understand its power very well and will do anything--I mean *anything*--to continue manipulating the page ranking algorithms. Going through my library catalog's log files, I have discovered some attempts at spamming that I consider nothing less than works of genius. Too bad these clever minds get dragged into such directions. I don't think it is best that these are the people selecting information for the citizenry, who believe they are looking at the most "relevant" results, but if we leave it all up to Google et al. that is exactly what everybody gets.

I find this potentially extremely dangerous and why I believe there must be something better, although it may not be perfect. Libraries have their codes of ethics, which should make them important and vital players in this world.

That is, so long as librarians are willing to change in fundamental ways. I don't know if they can do it though.

Monday, July 5, 2010

RE: Copernicus, Cataloging, and the Chairs on the Titanic, Part 1 [Long Post]

Posting to NGC4LIB

Stephen Paling wrote:

So, even with these ~very~ generous adjustments, we're still faced with a better-than-hundredfold increase in our workload. That's not 100%, that's 100-fold. For every book we catalog now, we would have to catalog an additional 104 items. If a cataloger currently catalogs a book every 15 minutes, give that cataloger some coffee, because she or he will have to catalog an item every 8.6 seconds. Holy hot sauce! Raise your hand if your cataloging department can absorb that extra work load.

Thanks for these statistics. It gives me a chance to refer to them in the future. A couple of problems with this analysis is: it has no room for selection (a topic I raised a while back). If we have blanket selection then yes, I agree there is simply far too much, but much of it is simply not worthwhile.

But once the task of selection is included (and how we can select from this mass I do not know. I have a few ideas, but it is a huge problem that must be solved, I think, before anything else), the numbers will fall drastically.

Then comes the problem of coordination of work. I have written several times that workflows in libraries are still based in the 19th century, as if each library were completely alone. Libraries have cooperated and coordinated to an extent through cooperative cataloging, but still with tremendous duplication every step of the way. Perhaps there is a reason for each library duplicating metadata when each library has a separate copy, but when everybody is looking at exactly the same things on the web, the reasons for duplication evaporate.

Plus, there are possibilities of unimagined cooperative efforts, not only for selection, but for other cooperative projects. If each library were supposed to do everything on its own, it would be a thoroughly hopeless task, *but if* we could imagine that other metadata creators, working around the world: all in Europe, Asia, and everywhere, *plus* non-library metadata creators, also from around the world, I would bet that suddenly the 100-fold problem, and even more, would easily fall into the realm of the possible.

This could be done technically. I think everyone on this list knows it. The problem that would then arise would be getting people to follow more or less the same standards, to change and to agree on these changes. If there are no standards of some sort agreed to, and not just for the coding such as dc.creator, but for the more important information that goes inside the coding, everything that comes out will be gobbledygook.

Is it impossible to believe that everybody could agree on some standards? Of course not. People have agreed to lots of international standards in lots of areas. The reason they agree to these standards is to eliminate duplication and increase efficiencies exponentially.

I fully realize that this is idealistic. But these sorts of impossibilities have happened before, and somebody, somewhere has to envision the impossible before it can finally come to pass.