Wednesday, May 30, 2012

Re: [ACAT] "What Does It Mean To Be Literate in the Age of Google?"

Posting to Autocat

On 29/05/2012 19:19, Kevin M Randall wrote:
<snip>
I guess we'll just have to keep waiting for your summary, because all I could find in the blog post you referenced is *yet another* restatement of the problem you're talking about, with no concrete suggestions at all. Not only that, I could find no *abstract* suggestions, either--unless you mean "provid[ing] links to 'reliable' resources" to be an actual suggestion.
</snip>

Well, I guess so, although I seem to remember some specific suggestions in the paper at Oslo at least. In fact, I even made a special web page as a prototype. That's a lot more than many have done.

Still, one of the first things to do is admit that the problems will not be fixed by changing cataloging rules or cataloging formats. It is my stance (as I have stated over and over) that the *library* is endangered and it is only the *library* that can respond: not catalogers, not reference librarians, not selectors--these are all bureaucratic tasks based primarily on 19th and 20th century library departments stemming from workflows--but *librarians* are the only ones who can respond. Therefore, all must expand their horizons.

So long as people insist on pretending that FRBR provides anything that the public wants and needs (without demonstrating it), and that RDA is worthwhile paying for and implementing without a business case (and I think we all know that implementing FRBR will be even more expensive), it will be impossible to move forward. As I wrote in my paper in Oslo, *nobody knows* the way forward now. Nobody. That includes me so unfortunately, I cannot play the role of Mr. Fix It.

We need to focus on selection, description and organization, reference, access of materials in the information available to the *patrons* and not only in our own local collections. I personally believe that the #1 task that will have the most public impact is selection but that would overwhelm cataloging. What would reference do? This means entirely new solutions. A lot of the solutions will be determined by answering uncomfortable questions, such as: what is the purpose of the local catalog today? I'm not sure what it is, but somehow it must work with the tools that the Google guy mentioned. Otherwise, librarians--especially catalogers--are history.

But here is one concrete suggestion: get rid of the job title "cataloger". Although it pains me to mention it and I think it is a fine job title, I fear that in the popular mind it recalls days long past and therefore, is the kiss of death. 

Re: [ACAT] "What Does It Mean To Be Literate in the Age of Google?"

Posting to Autocat

On 29/05/2012 18:05, William Anderson wrote:
<snip>
Perhaps, summarize these suggestions might be a better way of expressing this.
</snip>

I am busy writing some things on this now, but here are some of my own ideas from my writings, now found on my blog:

http://blog.jweinheimer.net/2011/08/re-day-made-of-glass_17.html
(Discussing the amazing Day Made of Glass video/ad at http://www.youtube.com/watch_popup?v=6Cf7IL_eZ38&vq=medium. I just discovered a part 2 to this by the way: http://www.youtube.com/watch?v=jZkHpNnXLB0&feature=youtu.be)
http://blog.jweinheimer.net/2012/02/is-rda-only-way-alternative-option.html
http://blog.jweinheimer.net/2012/02/revolution-in-our-minds-seeing-world.html

Also, after you watch the video by the Google guy, I am sure it will make you think. It really is a new world.

Re: [ACAT] "What Does It Mean To Be Literate in the Age of Google?"

Posting to Autocat

On 29/05/2012 16:46, William Anderson wrote:
<snip>
James Weinheimer wrote:
Of course, our tools must fit into these new types of space-age technology, and it becomes clearer that the "user tasks" as enumerated by FRBR seem more and more tired and antiquated by the minute.
The tart answer might be to offer up your own enumeration of such tasks. Persumably not all tasks apply to all users or all questions.
</snip>
I have made several suggestions in the past. And to head off at least some criticism, yes--of course some people still want to do the FRBR user tasks, just as there are still some who want to walk, or ride a horse, from Venice to Rome. (There are still real, live pilgrims, just like in the old days. Not many, but a few are still around) Of course, most, including myself, prefer to drive, fly or take the bullet train, while I enjoy the trip in the club car.

But when a major and expensive project is begun to create a product that is demonstrably obsolete such as FRBR and RDA is doing, especially during hard fiscal times, it only seems reasonable to question if it is aimed at making something that people want. Google is finding out what people want, Facebook, Microsoft, lots of big organizations are.

I will always say that libraries can provide something important to the public that these other organizations do not and possibly can not, but it is not FRBR. It is really imperative that we disinvest ourselves of this idea.

Tuesday, May 29, 2012

"What Does It Mean To Be Literate in the Age of Google?"

All,

Apologies for cross-posting, but I think this is important for everyone.

I would like to suggest that everyone watch a public lecture from Princeton by Daniel Russell, who has the improbable job title: Über Tech Lead for Search Quality and User Happiness at Google. His talk was "What Does It Mean To Be Literate in the Age of Google?" http://hulk03.princeton.edu:8080/WebMedia/flash/lectures/20120228_publect_russell.shtml. It's rather long, but the talk is very important for all librarians I think. Some of the questions he is asked are not bad either.

I have mentioned in several postings that it is important to build tools that fulfill the needs of the public and not only our own needs. Of course, one of the main problems is that we need to find out what the public wants, but even beyond that, since systems have changed, and are changing so radically, we also need to find out what is even possible to do with the new search powers at our disposal. This lecture provides that level of information. Among many examples, he mentions his blog where he asks "research questions", http://searchresearch1.blogspot.it/. He gave an example of one of those questions: it was a photo of an unknown cityscape taken out of a window. The question was: what is the phone number of the office where the picture was taken? Apparently, it can be done today!

While that is not such a realistic question, the very fact that it could be answered verges on the incredible. Of course, few people are able to do it (I can't, although the answer is on his blog). His solution for people is, strangely enough, that everyone train themselves! To become "informate" is the term he uses.

My own opinion is that searching the library catalog was and is immeasurably simpler than what he suggests, and librarian experience shows that even that turned out to be too much. To do what he suggests requires much more skill, practice and time, and I think is not a solution. Nevertheless, someone needs to be the expert in the kind of searching he describes, and this would seem to be a perfect niche for librarians. Two points he mentioned are needed to become "informate": when searching you must be: persistent and creative. Librarians fulfill both of those requirements.

Of course, our tools must fit into these new types of space-age technology, and it becomes clearer that the "user tasks" as enumerated by FRBR seem more and more tired and antiquated by the minute.

Saturday, May 26, 2012

Re: [NGC4LIB] cover image and amazon

On 26/05/2012 04:22, Karen Coyle wrote:
<snip>
I think this is a bit of a misunderstanding of how linked data can be used. Yes, *someone* could merge WorldCat and Amazon (if both had open data), but that wouldn't affect either WorldCat nor Amazon. The way I try to explain this is that it's like putting a document onto the web: other people can link to it, or pull bits of it into their web sites, but your document itself isn't changed -- because it's your document. Mashups happen somewhere else, they don't modify the original data.

The reason why there is a lot of work on adding provenance information to each bit of data in linked data space is so you can 1) easily identify your or your community's data 2) select data to use based on who creates and maintains it. You do not have to use someone else's data if you don't want, any more than today you are forced to show other people's stuff on your own web page. In a Linked Data environment, libraries could continue to use only library-created data. I don't advise that (since I think we have a lot to gain by linking to quality data from other communities), but it is a possibility.
</snip>
I believe I understand all of that, but it is a theoretical viewpoint. Although the original data is not changed, what is important is the final product as experienced by the individual at the end of those processes, as each mashup takes place. I personally do not see why in the future people should turn to library catalogs or even Worldcat any more than they are today, and if anything, the numbers will probably go down. There may be no antidote for this, it is merely a symptom of the library losing its traditional control over its collection, its searches, and the results of those searches. In the linked data universe, all that will matter will be the final product that a person sees. It will vary from person to person based on all kinds of variables. Library records must go to the public because the public will probably not come to the library unless they find out about our resources in the places they inhabit.

As a result, there will be multiple places where libraries will see their metadata being used, including sites that will, in traditional terms, be considered very strange, if not unethical in traditional terms. Already, there are links from the catalog record to where someone can buy that item. Those would have been banished by librarians 20 years ago. The questions are natural: if someone finds a book in Worldcat and ends up buying a book on amazon, does Worldcat get paid for it? If not, why not? I mean, should it only be a one way street--shouldn't the person hawking the product get something? Or should the seller get it all? And what of the actual library, and the cataloger(s), who created the record in the first place? They made it all possible. Shouldn't they get a piece of the action?

Such commercialization can easily devolve into not only various financial matters, but into moral, political, religious matters and so on and so on. This is the reason for some of those wonderful library ethics, and why reference librarians do not receive $10 or $20 each time they send an unwary student to search an Elsevier or Proquest database, or why a cataloger who works for university X does not do a really lousy job creating metadata for resources created by competing university Y. Could this happen in the linked data universe? Of course, it happens all the time on the web right now. Here it is with Google http://www.bbc.com/news/technology-18143812 and who knows how else it is all being abused. This is one of the reasons I did that podcast on linked data--to show that while there is an idealistic, positive side, there is also another side that should not be ignored.

We must enter the world of APIs and mashups, and we can test the waters of the linked data universe to see if it would be worth the effort. Perhaps such a turn for our metadata as I have laid out is inevitable because they have to do with the internal workings of how the current world wide web functions--I don't know. But even if it is, we should enter into it with eyes wide open, in the knowledge of what we are gaining, as well as what we are losing.

Re: [NGC4LIB] cover image and amazon

On 25/05/2012 19:10, Ross Singer wrote:
<snip>
On May 25, 2012, at 1:06 PM, James Weinheimer wrote:
Of course, this entire matter becomes exponentially more chaotic when and if we enter the world of "linked data" (heavenly chorus). 
How?
</snip>
Because when everything can be linked and mashed up into new creations, information may lose its original purpose and shape, just as the records in Worldcat can now have links to Amazon, and they don't have to be just links, but the amazon record could actually be merged with the records in Worldcat. In the greater world of linked data, the records can be mashed up to include ads from everywhere, and who knows what else they will come up with to sell their products?

Of course, companies will use every opportunity they possibly have to sell their goods--to believe anything else is naive.
http://www.nytimes.com/2012/01/24/world/europe/debt-ridden-greece-turns-to-sacred-sites-for-cash.html

I am not saying there is anything necessarily wrong with this, but I believe the library is special since there are ethical considerations, e.g. from http://www.ala.org/advocacy/proethics/codeofethics/codeethics
  • We provide the highest level of service to all library users through appropriate and usefully organized resources; equitable service policies; equitable access; and accurate, unbiased, and courteous responses to all requests.
  • We uphold the principles of intellectual freedom and resist all efforts to censor library resources.
  • We protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.
  • We do not advance private interests at the expense of library users, colleagues, or our employing institutions.
  • We distinguish between our personal convictions and professional duties and do not allow our personal beliefs to interfere with fair representation of the aims of our institutions or the provision of access to their information resources.
These statements are not shared by all entities going into the linked data universe. Perhaps the tools will have to change but it does seem fair to discuss it.

Re: [NGC4LIB] cover image and amazon

On 25/05/2012 18:21, Karen Coyle wrote:
<snip>
On 5/25/12 8:38 AM, john g marr wrote:
On Fri, 25 May 2012, Sarah Sherman wrote:
Just wondering how you source cover images for your catalog?
We don't. Since cover images are simply emotionally manipulative advertising, they don't belong in a purely informational library catalog.
I would counter that many of the books themselves are emotionally manipulative and commercial. Hey, even our patrons are manipulative and emotional! Let's eliminate the books, eliminate the patrons, and just stick to the facts. No, to the ones and zeroes. Nothing but ones and zeroes. 010100001011100011...
</snip>
I do believe that John raises a valid point. If in days of the card catalog, the author or publisher of a book had put advertisements and even an order form for the book in with the card, it would never have been allowed; in fact, librarians would have been absolutely appalled. But today, it seems to be OK. In fact, in the report from OCLC, "The Use of Eye-Tracking to Evaluate the Effects of Format..." by Michael Prasse,  http://www.oclc.org/news/publications/whitepapers/WC_ITrack_AReport_23Feb2011.pdf one of the main complaints of the research participants was that it was too difficult to purchase books online, which completely ignores the idea that somebody should be able to get the book for free! This is bizarre, at least today. But I confess that lots of things I see on the web would have made me fall off my chair if I had seen them 20 years ago. The world, and we ourselves, really are changing that fast.

Nevertheless, there is a certain commercial aspect that seems to be creeping in. If you look how Worldcat works, e.g. http://www.worldcat.org/title/aftershock-the-next-economy-and-americas-future/oclc/495781520, there is a little icon that says "Buy it" which gives you some options to buy the book. If you look at the same item in Google Books http://books.google.it/books?id=pzVxn-884JAC, you will see lots of links to various buyers (obviously based on location since I see Italian sites) before you see the link to the Libraries. I wonder how many actually click on "Find in a library" as opposed to a seller. Of course, if the agreement had been accepted and the full-text were available on Google Books, it would be another matter entirely....

Do we just close our eyes to all of this? Should the catalog be a part of such a commercially crass situation when it never would have been before?

I really do not know what I think about this. I don't want to be called a Luddite, but neither do I want to let searchers of the catalog be more exploited than they are already.

Of course, this entire matter becomes exponentially more chaotic when and if we enter the world of "linked data" (heavenly chorus).

Friday, May 25, 2012

Re: [ACAT] From privacy to Orwell

Posting to Autocat

On 23/05/2012 22:45, Kyrios, Alex wrote:
<snip>
I think one of the other problems we as librarians encounter is that in a world where it's increasingly intelligent to assume you have no privacy, we, as a whole, continue to hold an almost religious level of zeal for maintaining user privacy. Now, I certainly don't see this as a bad thing. We can certainly parlay this into a strength, with marketing that says to patrons, "You have no privacy online, but with us, you're safe." So why is this problematic? People are used to Google and Amazon's suggestive algorithms based on their history, and I suspect many users today believe we have such technology, when in most cases, we specifically avoid such tracking of reading history. I would suggest not that we jump on the bandwagon, but that we keep such records when a patron wants us to, either in an opt-in or an opt-out system, depending on how cautious we want to be. I suspect some libraries are already doing this. As long as we exercise due responsibility in informing patrons that, say, the Patriot Act could make such records available to the government, I think we could please the majority of people involved. Bottom line, more options for patrons is a win for everyone.
</snip>
While personally, I completely agree and think that libraries need to begin in earnest to create their own roads instead of merely following those of others, it remains to be seen whether these issues are really important to the public. I have mentioned before that I believe that one of the main ways the library field can survive and thrive is if we focus on the ethical aspects of librarians: privacy, lack of bias, no making money off of what we suggest, etc. Also, I believe that our conceptual searching vs. searching of text could be exceptionally valuable.

But, the public must both be made aware that this is what they get when they search a library catalog, and to appreciate it. I don't believe it will be an easy task. I honestly feel that people want an unbiased selection of worthwhile materials, and then to be able to navigate through that information in a conceptual way, not just through text. Yet strangely, such ideas have become completely foreign to many member of the public and are difficult to understand. Unfortunately, library catalogs still are very clumsy tools and it is tough to demonstrate their advantages.

Still, it is critical for libraries to strike out in new directions. Privacy may be a fruitful area for some concerned citizens, but as I tried to point out in my last message, this means you opt out of Web2.0. This must be addressed. Otherwise, I fear that libraries will be fated to remain--in the popular mind--as an "inferior Google" or a "poor type of amazon.com".

Wednesday, May 23, 2012

Re: [ACAT] From privacy to Orwell

Posting to Autocat

On 23/05/2012 01:46, john g marr wrote:
<snip>
We all need to be aware of privacy issues, and, in fact, we generally are (e.g. whether to turn circulation records over to the govt.).

I didn't see this article posed earlier, so thought I'd drop it into the mix. "The Terrifying Ways Google Is Destroying Your Privacy" / by David Rosen. Here's the URL:
http://www.alternet.org/story/155479/the_terrifying_ways_google_is_destroying_your_privacy?
</snip>
My own opinion on this--again, trying to return the conversation to the catalog and the library--is that this actually deals with the changes in the expectations of the public. I will take issue, at least to a point, with one statement made there: "Two complementary forces are driving this change: short-term corporate self-interest and a self-serving security-state. The ordinary American’s traditional privacy rights are giving way to the demands of the militarized corporate state. They are determining America’s digital economy and future."

There is at least one other force at work here: the existence of the Web2.0 tools, or the collaborative web. The only way for these "wonderful" search tools that provide you with results that are somewhat tailored to what you presumably want, are premised on getting as much information about *you* as possible. When I have told people about how Google and all of these sites put "cookies" on their machines so that they can track you, and how to find these cookies, what they look like and what it all means, people want to stop storing them. I show them how to delete cookies, and even how to shut them down completely but I also say that cookies are absolutely necessary in the internet environment. [For those who don't know, a cookie is just a string of numbers and letters left within your browser so that a remote machine knows that you really are who you say you are. In the current web environment, it is the only real option. http://en.wikipedia.org/wiki/HTTP_cookie]

Of course, there are always people and organizations who try to take advantage of any openings they can find, and cookies offer a highly convenient point. There is a very interesting and frightening TED talk on this by Gary Kovacs: "Tracking the trackers" http://www.ted.com/talks/gary_kovacs_tracking_the_trackers.html where he talks about the Collusion add-on for Firefox.

I installed the Collusion add-on and found it not quite so bad as I thought. Some of these trackers are sites like Google Analytics, which tell you how people are using your site. I use it on my blog. You don't have any personal information, but you know that during the last hour or last week or last month, that x number of people got onto your site and looked at these pages; there is a breakdown by country; by how long people stayed on; the search terms on Google or Bing they used to find your site, how many were newcomers to your site and how many were returners, etc. In this case, I personally do not think this is a violation of anyone's privacy. I placed a screen clip of one part of mine where I looked at The Guardian newspaper, and these are the connections: http://www.jweinheimer.net/images/collusion.png. I see some I do not know, but there is Twitter, Linked In, Facebook, Google, and some others.

But, the moment you log into a site, a specific cookie can be placed into your specific machine and then you can be tracked. Google does this for anyone with a Google account, and this can happen anytime you log into any account, including a library account. I must say that it would be interesting, and maybe even useful, to know that a professor or grad student or undergrad in a specific department logged on and searched for specific terms.

I may not like many of these developments, but they are part of the very fabric of the internet. It would be very difficult at this point to change them. On the mobile web, this is probably magnified.

To return to the catalog: if we are to use the "wonders" of modern search in our catalogs, we seem to be stuck. To renounce them would be to ignore the expectations of our public.

Monday, May 21, 2012

Re: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF)

Posting to RDA-L

On 21/05/2012 18:06, Karen Coyle wrote:
<snip>
Obviously, you can do what you want with FRBR inside your own system, but we're talking about massive sharing of data. It's the sharing part that matters. The danger is that the library community will form standards that are widely followed but that are not a good idea. Or that deteriorate over time, like MARC, but we're so stuck to our standards that we can't imagine changing. If you actually look at that page and read the arguments there, rather than just shoot back an email telling me that I don't know what I'm talking about, you might see why some folks are concerned.
</snip>

Yes, sharing data, and sharing it in the ways as seen in the Linked Data world, is entering unknown territory. The non-libraries who are already there, and those who are trying to get there, are not waiting for libraries to show them the "right" ways to do it. I don't think they really care if library metadata is added or not. Therefore, it is up to libraries to enter *their* world in the best ways possible and not expect everyone to follow us.

I personally cannot believe the FRBR structures/ontology will be widely followed, but to expect the (weird) WEMI structure to magically become compatible with other structures that are only W or E or M or I or strange amalgamations that change constantly, or are generated dynamically--such as XSL Transformations and the on-the-fly transformations such as Google Translate, or when browser plugins are used--is taking a lot for granted. What I personally believe is that WEMI is more of a remnant of the print/physical world and has little to do with most digital information.

Not that most members of the public want WEMI anyway.

Wednesday, May 16, 2012

Comment to: Improving the presentation of library data using FRBR and Linked data

Comment to:  Improving the presentation of library data using FRBR and Linked data by Anne-Lena Westrum, Asgeir Rekkavik and Kim Tallerås. Code4Lib Journal, Issue 16, 2012-02-03

This is a very interesting article for anyone interested in how the public works with library catalogs. One additional aspect that would be especially interesting would be a comparison among patrons for the result sets as foreseen by FRBR, and those available through modern indexing practices. These can be seen in Worldcat now.

To take your example of Hamsun's Sult the uniform title is: Hamsun, Knut, 1859-1952. Sult, and searching this in Worldcat correctly http://www.worldcat.org/search?q=ti%3A%22sult%22+au%3A%22hamsun%2C+knut%22 gives 156 results that the patron can modify by format, other authors, dates, languages, and so on. This list can also be sorted by author, title, date, etc. Other indexes could be created too.

To get this kind of result, of course needs the catalogers to add consistently and correctly the uniform titles. But that is part of their job.

Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF

Posting to RDA-L

On 15/05/2012 17:53, Jonathan Rochkind wrote:
<snip>
Frankly, I no longer have much confidence that the library cataloging community is capable of any necessary changes in any kind of timeline fast enough to save us.

Those that believe no significant changes to library cataloging or metadata practices are neccesary will have a chance to see if they are right.

I believe that inaction -- in ability to make significant changes in "the way our data is currently recorded and maintained" to accomodate contemporary needs -- will instead result in the end of the library cataloging/metadata tradition, and the end of library involvement in metadata control, if not the end of libraries.  I find it deeply depressing. But I no longer find much hope that any other outcome is possible, and begin to think any time I spend trying to help arrive at change is just wasted time.
</snip>

I think many share your fears. I certainly do, but it is important not to give up hope. The problem as I see it is that while everyone agrees that we should move "forward", we don't even know which direction "forward" is. Some believe it is east, others west, others north, others up, others down. Nobody knows. Is the basic problem in libraries "the way our data is currently recorded and maintained"? For those who believe this, then it would mean that if libraries changed their format and cataloging practices, things would be better.

But this will be expensive and disruptive. That is a simple fact. And undertaking something like that during such severe economic times makes it even more difficult. So, it seems entirely logical that people ask whether this *really will* help or whether those resources would be better used to do something else. In fact, this is such a natural question, not asking it makes people raise their eyebrows and wonder if there really is an answer. This is why I keep raising the point of the business case. It is a fundamental, basic task.

And another fact is, if we want to make our records more widely available in types of formats that others could use, it can be done right now. Harvard is doing it with their API: http://blogs.law.harvard.edu/dplatechdev/2012/04/24/going-live-with-harvards-catalog/ They say their records are now available in JSON using schema.org, in DC or in MARC, although all I have seen is MARC so far. Still, Kudos to them! It is a wonderful beginning!

So it is a fact that the library community does not have to wait for RDA, FRBR or even the changes to MARC to repurpose their data. Would it be perfect? Of course not! When has that ever had anything to do with anything? Everyone expects things to change constantly, especially today. A few years of open development using tools such as this would make the way "forward" much clearer than it is now. Then we could start to see what the public wants and needs and begin to design for *them* instead of for *us*. If we find that there is absolutely no interest in open development of library tools, that would say a lot too.

To maintain that RDA and FRBR are going to make any difference to the public, or that they are necessary to get into the barely-nascent and highly controversial "Linked Data", is simply too much to simply accept. Each represents changes, that's for sure, but theoretical ones that happen almost entirely behind the scenes, and all whose value has yet to be proven. All this in spite of the incredible developments going on right under our noses! Therefore, it seems only natural to question whether RDA, FRBR and "Linked Data" truly represent the direction "forward" or are they actually going in some other direction.

On a more positive note, I think there are incredible opportunities for libraries and librarians today.

Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF

Posting to RDA-L

On 15/05/2012 16:50, Jonathan Rochkind wrote:
<snip>
I certainly agree that the way our data is currently recorded and maintained in MARC is not suitable for contemporary desired uses, as I've suggested many times before on this list and others and tried to explain why; it's got little to do with rdbms though.
</snip>

Although MARC needs to change, and has needed it for a very long time, I don't see how changing the format would improve the subject headings. The semantics are there already, so searching would remain the same. It is the display of the multiple search result which has disintegrated. I think there are lots of ways that the displays could be improved for the public--primarily by making them more flexible and could be experimented with now--but even then, there will need to be a major push from public services to get the public to use and understand what the subject searches are. All of it has been effectively forgotten by the public.

For a whole lot of reasons, library subject searches will always be substantively different from what what people retrieve from a full-text search result and while librarians can understand this, it is a lot harder for the public.

Tuesday, May 15, 2012

Re: [RDA-L] Part 2: Efficiency of DBMS operations Re: [RDA-L] [BIBFRAME] RDA, DBMS and RDF

Posting to RDA-L

On 15/05/2012 02:52, Karen Coyle wrote:
<snip>
let's say you have a record with 3 subject headings:
Working class -- France
Working class -- Dwellings -- France
Housing -- France

In a card catalog, these would result in 3 separate cards and therefore should you look all through the subject card catalog you would see the book in question 3 times.

In a keyword search limited to subject headings, most systems would retrieve this record once and display it once. That has to do with how the DBMS resolves from indexes to records. So even though a keyword may appear more than once in a record, the record is only retrieved once.
</snip>

I don't believe that is correct. That kind of search result should be a programming decision: whether to dedupe or not. It seems to me that a record with "France" three times in the record could easily display three times in a search result if you want it to. With relevance ranking, or ranking by date, etc. it makes little sense to display the same record three different times, although I am sure you could. Having a record display more often makes sense only with some kind of browse heading display but I have never seen that with a keyword result.

This is a great example of how our current subject heading strings just don't function today, and they haven't ever since keyword was introduced. Computerized records work much better with descriptors than with traditional headings, for instance, your example would be something like:
Topical Subjects: Working class, Dwellings, Housing
Geographic Subject Area: France.

Here, there is no question since "France" appears only once in the subjects.

Seen in this light, our subject headings are obsolete but nevertheless, I believe our subject headings with subdivisions provides important options found nowhere else, as I tried to show in the posting I mentioned in my previous message. But really, how the subject headings function must be reconsidered from their foundations, otherwise they really are obsolete.

The dictionary catalog really is dead, at least as concerns the public.

Monday, May 14, 2012

Re: [RDA-L] RDA, DBMS and RDF

Posting to RDA-L

On 13/05/2012 19:49, Karen Coyle wrote:
<snip>
All,

After struggling for a long time with my frustration with the difficulties of dealing with MARC, FRBR and RDA concepts in the context of data management, I have done a blog post that explains some of my thinking on the topic:

http://kcoyle.blogspot.com/2012/05/rda-dbms-rdf.html

The short summary is that RDA is not really suitable for storage and use in a relational database system, and therefore is even further from being suitable for RDF. I use headings ("access points" in RDA, I believe) as my example, but there are numerous other aspects of RDA that belie its intention to support "scenario one."

I have intended to write something much more in depth on this topic but as that has been in progress now for a considerable time, I felt that a short, albeit incomplete, explanation was needed.

I welcome all discussion on this topic.
</snip>
This is really good. I question whether libraries primarily need a new relational database model for our catalogs, especially one based on FRBR. I still have never seen a practical advantage over what can be done now. The power of the Lucene-type full-text engines and the searches they allow and their speed are simply stunning, and nothing can compare to them right now. There are versions such as the Zebra indexing system in Koha, which was created for bibliographic records and very similar to Lucene. http://www.indexdata.com/zebra and the guide http://www.indexdata.com/zebra/doc/zebra.pdf.

A relational database would be far too slow if used in conjunction with a huge database such as Google. So, some catalogs use the DBMS only for record maintenance, then everything is indexed in Lucene for searching, while the displays are made from the XML versions of the records. The DBMS is there only for storage and maintenance. This is how Koha works and could be more or less how Worldcat works as well, but these are not the only catalogs that work like this.

Still, I will say that much of this lies beyond the responsibility of cataloging per se, and goes into that of systems.

But on the other hand, your point that library headings are not "relational" and are actually based on browsing textual strings really is a responsibility of cataloging. It is also absolutely true and should be a matter of general debate. The text strings haven't worked in years because what worked rather clearly in a card catalog did not work online. I've written about this before, but there was a discussion on Autocat not too long ago. Here is one of my posts where I discussed the issue and offered an alternative to the current display of the headings found under Edgar Allen Poe: http://blog.jweinheimer.net/2012/04/re-acat-death-of-dictionary-catalog-was.html

I still maintain that we do not really know what the public wants yet. Everything is in a state of change right now, so it will take a lot of research, along with trial and error, to find out. I do think that people would want the traditional power of the catalog, but they will not use left-anchored text strings. The way it works now is far too clunky and new methods for the web must be found. Paths such as you point out would lead to genuine change and possible improvements in how our catalogs function for the public, which is the major road we need to take.

Thursday, May 10, 2012

Re: [NGC4LIB] Google Penguin and SEO

Posting to NGC4LIB


On 09/05/2012 21:18, john g marr wrote:
<snip>
On Wed, 9 May 2012, Laval Hunsucker wrote:
James Weinheimer wrote :
A search for "ebooks" has www.ebooks.com come up second to Project Gutenberg.
Now that rather depends, doesn't it, on ( i.a. ) the who and when and from where of the search?
True, but it is also indicative of the fact that search results (like everything else, including customers and searchers) can be manipulated from a number of directions for individual advantage. The question then becomes how to manipulate manipulation to become egalitarian.
Reward and punishment should not be part of the library's tools.
Possibly not, but they always have been, and I don't know how they can be eliminated.
Your comment is significant in placing emphasis on static assumptions and personal knowledge. We should instead be consistently questioning stasis and taking the "possibly" corrupting presence of reward and punishment systems as evidence of a need to explore how they can be supplanted by egalitarian (I'm starting to like that word) collaborative tools.
</snip>
Interesting ideas. Laval's comment that each person sees search results that are individually personalized (based also on ip address) is absolutely correct and only emphasizes the problem. My intention was to focus on the library tools that try to ensure reliable results over the disputable terms "better" or "more useful". "Reliable" means an entire raft of things, from being able to find the resources I found last week, and find them in the same ways I found them back then, to making sure that the items still exist so that they can be found in the first place. For dynamic sites, perhaps the site will look different and even have different information, but that is another task. Searchers, I think, need to be assured that they can examine materials that they saw a few weeks ago or a few years ago. This should be a minimum. It's difficult enough now, but with "improvements" such as Google Penguin and Panda, it could be made so complex as to render it practically impossible. I didn't mention that everyone merely assumes that Google's motto "Don't be evil" is followed today, and will be in the future, because this could certainly be very evil.

I don't know if I would agree that "reward" and "punishment" are part of the library's tools, except for punishing the searchers who have to know and follow the obsolete methods of how to search the catalogs, indexes, and other kinds of bibliographic tools if they want to find things effectively. The practice of cataloging certainly attempts to treat all resources in an egalitarian manner (although I confess that I always sort of zoned out when it came to semiotics texts!). Minimal level cataloging decisions could be seen as "punishment" although I see it more as a lack of adequate resources. But, perhaps these two ideas are more closely connected than I imagine.

Selection could realistically be called "reward" and "punishment" but that is also mostly due to lack of resources. Most selectors would gladly get many more materials for their patrons if they had more money, space, and staff. Reference could definitely punish some publications since reference librarians can send people to other resources and say negative things about specific ones. And yet, this can also be considered "professional opinion". In any case, this should result in very little harm for the authors and publishers of the resource, especially in comparison with Google's lowering them in the results list, often going to second or third pages, when they are effectively in limbo.

Wednesday, May 9, 2012

Google Penguin and SEO


Posting to NGC4LIB
 

The blogosphere has been discussing the latest updates to the Google Search algorithm, these called "Google Panda 3.5" and "Google Penguin” announced April 24 of this year http://googlewebmastercentral.blogspot.co.uk/2012/04/another-step-to-reward-high-quality.html and Google Penguin has proven to be especially controversial. In essence, it is a step against some of the methods used in SEO (search engine optimization) that Google has deemed negative, or to use their terms “black hat webspam”. What does this mean? The official announcement (above) discusses this in some detail, including terms like “keyword stuffing” and “link schemes” while Google cites its own “quality guidelines”. Google punishes the websites it deems guilty by sending their sites farther down the list of results, and this can have devastating consequences for those involved.

Google Penguin may have had human costs already. Here is a post from one SEO fellow who claims that he will be impoverished http://www.seroundtable.com/google-penguin-casualties-15079.html, and another that Penguin has led to unemployment in parts of the developing countries because so much of SEO is taking place in those countries. http://globalvoicesonline.org/2012/05/05/india-google-searchs-change-in-algorithm-and-its-impact/ These reports have not been verified, but it seems as if they could be logical consequences. The impact of the websites going down significantly in the web results definitely have negative consequences for the businesses affected, along with their employees.

This shows how much power Google has. I wrote about this in another post, “Google and Link Spam” http://blog.jweinheimer.net/2012/01/google-and-link-spam.html about a little store that sells flour in Vermont. Also, since everything on the web has to do with money, I personally suspect that these updates have some relation to the looming Facebook IPO that so many are talking about. https://www.google.com/search?q=facebook+ipo&tbm=nws

In Google terms, “quality” in a website is quite a different concept from what a normal person would consider quality, and certainly is radically different from the idea of “quality” in a library's collection, its catalog, or its public services. Google provides basic guidelines for their idea of quality:

  • Make pages primarily for users, not for search engines. Don't deceive your users or present different content to search engines than you display to users, which is commonly referred to as "cloaking."
  • Avoid tricks intended to improve search engine rankings. A good rule of thumb is whether you'd feel comfortable explaining what you've done to a website that competes with you. Another useful test is to ask, "Does this help my users? Would I do this if search engines didn't exist?"
  • Don't participate in link schemes designed to increase your site's ranking or PageRank. In particular, avoid links to web spammers or "bad neighborhoods" on the web, as your own ranking may be affected adversely by those links.
  • Don't use unauthorized computer programs to submit pages, check rankings, etc. Such programs consume computing resources and violate our Terms of Service. Google does not recommend the use of products such as WebPosition Gold™ that send automatic or programmatic queries to Google.
Of course, none of this has anything to do with the general notion of “quality”: the actual quality of the information contained in a webpage, whether the information is factual, is it obsolete or biased, whether it is based on sound reasoning or superstition. Google's “quality” is related to a type of authenticity, although of a strange type. It is there primarily to protect people from wasting too much of their time on pure advertising. (I think)

Still, many have been left confused. For instance, their guideline “Avoid tricks intended to improve search engine rankings” seems to get rid of SEO altogether. What one would consider a trick, another would consider a flash of brilliant insight.
SEO is vital but as with everything, it can be abused. Without SEO, the biggest sites could only continue to rise while the lesser ones would be fated to disappear into the morass.

One case where Google Penguin will demote your site in the search results is if your site gets too many links made to it too quickly. This is opposed to the concept of “natural links” which are supposed to build up gradually over time. "Natural links?" This all seems strange to me. What if you come up with the “killer information” that everyone wants, or something you have created is suddenly the newest internet meme? Why should you be punished? How long will the punishment last?

Also, it turns out that Google Penguin may actually make it easier for competing websites to harm one another. How can this happen? By one website employing the “black hat webspam” techniques mentioned above, but have them aimed at their competition's site instead of their own. Clever, but fairly obvious when you get into that kind of mindset. http://www.webmasterworld.com/google/4451050.htm

Another interesting demotion is in what is called “exact match domains” which are domain names that are based on the popular keywords. For instance, if you search “credit cards” the first hit is not Mastercard or something, it is www.creditcards.com. Everyone is saying that these domains have been demoted but I haven't seen it yet. “Brittany Spears”, “Barack Obama”, and “credit cards” all come up on top. A search for “ebooks” has www.ebooks.com come up second to Project Gutenberg.

Much of the impact is still being researched, but it must be understood that “relevance” when speaking in terms of search engines, is quite different from what the same term “relevance” means in the way people use it in their everyday speech.

While I still believe that SEO will eventually become an important issue for catalogers, the example of Google Penguin shows the dangers of it: that what could be found easily yesterday is much more difficult to find today. The patrons of libraries—not to mention the librarians themselves--would find this outrageously complicated, if not bordering on the insane.

The traditional task of the library catalog to provide “reliable results” remains just as crucial as ever, in my opinion. If SEO is to be worked into the library's tools in some way, it must allow for these additional needs. Reward and punishment should not be part of the library's tools.

Tuesday, May 8, 2012

Re: Setting an Example for Academic Research

Comment to Daniel Stuhlman's blog posting Setting an Example for Academic Research, Monday, May 7, 2012.

http://kol-safran.blogspot.it/2012/05/setting-example-for-academic-research.html
 
This shows the changes that are occurring now in the field of publishing and bibliography. I personally do not like some of these trends, but nevertheless, they are happening. Wikipedia is becoming an accepted and valued source, if not by academia, then by the world at large, and even by academia itself, as you mentioned.

There is no way to stop these trends and I don't think they are such bad ones. 30 years ago, if someone were stuck in a little town with a poor library--not only in the US but around the world--your options for information were very limited, but compared to today for those people, an entire world has been revealed. A lot of the scholarly tools are hidden behind pay-walls (that is, right now, but this too may change), so Wikipedia is certainly far better than a book written 25 or 30 years earlier, which may have been all that your library had. If that.

One task for librarians in their information literacy workshops for students is not to tell people only about what is available in the databases the library pays for but what is on the open web. Why? Because the students will leave someday, and relatively few will stay in academia. We don't want our students to be stuck high and dry without anything, and Wikipedia is certainly far better than nothing.

But there is much more than Wikipedia on the web, some very valuable open archives, some great book talks, entire conferences, think tank publications, and on and on almost without end. Google doesn't find a lot of these great resources since Google contains far from everything, and even if these materials are in Google you may not be able to find them unless you know how to search and consequently, you must have some skills. Better searching tools would help too.

There is so much librarians could do to improve matters today, but I don't know if it will turn out that way. 

Monday, May 7, 2012

Re: [ACAT] Who actually needs to understand FRBR principles?

Posting to Autocat

On 01/05/2012 15:40, Kathleen Lamantia wrote:
<snip>
I have heard several experienced presenters opine that a wide percentage of library staff need to be made familiar with FRBR principles, at a minimum the WEMI concepts. I am extremely hesitant to do this. I understand WEMI; I do not find it helpful in any way, and I don't think my staff or patrons will either.

I work in a public library. Our OPAC has icons which show material type (derived from Field 30 Mat Type in III). Patrons use these to select which format they desire.

Our staff uses the gmd field when they are assisting patrons in searching. We plan to continue to use this. 336, 337 and 338 may or may not be suppressed, we haven't decided yet.

 I understand that RDA is related to the wider world, and that its mission is to make our data not only useful to our own patrons, but to enable searching of it by the entire online community. However, neither my staff nor my patrons are aware of this, and if I do not serve them effectively, my OPAC won't last long anyway.

 So, my question is: Which staff members/departments do you believe should be taught FRBR/WEMI - and why?
</snip>
I think everyone needs to have a profound understanding of the changes RDA mandates concerning abbreviations. As an added attraction, I would suggest that cataloging departments organize a lively debate concerning precisely why "cm" is considered a symbol and not an abbreviation. This debate could be based on the point-counterpoint method.

Just kidding....  :-)

But to be serious, in a meeting with the staff, it will be important for everyone to have an idea of what changes they will see with RDA. These will be very small changes and will raise the inevitable question, "If this is all there is, why is this happening?" These questions could be highly pointed because the costs for implementation will probably mean that at least some of it will be coming out of the budgets of other departments. The only way that it makes any sense is to say: it is on the road to FRBR. And then *that* opens up the can of worms.

Once again, this is one of the unavoidable costs of refusing to come up with a valid business case. Every library will have to come up with one on their own, but somebody, sooner or later, has to come up with decent reasons other than visions of a wonderful new future.