Friday, September 30, 2011

RE: Super MARC to code RDA?

Posting to RDA-L

On Fri, Sep 30, 2011 at 12:42 AM, Kevin M Randall wrote:
<snip>
By "one catalog", are you referring to that little thing I keep bringing up, the Ex Libris Voyager system? That is one product, but many thousands of "catalogs" around the world. (Including a catalog at this quaint place you may have heard of, the Library of Congress.)

Why the records are stored (and used) in Voyager in that format, I don't know for sure. But I can only assume it is because that happens to be the most efficient way of using system resources. Yes, there are many other tables to support indexing, but the full bib record exists ONLY in the MARC 21 ISO2709 format (albeit placed within a field of an Oracle table, and sometimes broken up into multiple rows of the table, depending on length of the ISO2709 string). When exporting, the records are not "recompiled" but are rather copied directly from the ISO2709 strings.

Actually, I would be rather surprised if it turns out that Voyager is alone among the major players.
</snip>
I have no special love or hatred for the ISO2709 version of the record. I honestly couldn't care less. We could retain it for our communications format IF we could show that it is as flexible as XML and also, widely utilized by the different software developers out there. I don't see that happening with ISO2709. Plus, how a specific database/catalog wants to store its information internally is a matter of practically no concern to catalogers but is a concern to the database designers. Catalogers should care that the system stores and retrieves everything reliably.
<snip>
While I also agree that numbers and other language-neutral tags have their advantages, I really don't think it's necessary to have them in a new metadata carrier. If things are done right this time around, catalogers will NOT, NOT, NOT be working with records in the "native" language of the metadata carrier. Just as there is absolutely no excuse for requiring catalogers in this day and age to have to work with MARC tags, indicators, and subfield codes, there should be absolutely no excuse for requiring them to work with constructs such as (to quote an example from Diane Hillmann's "Getting Real with RDA" presentation):
<rdarole:author>http://lcnaf.info/79062641</rdarole:author>
That is how it might look behind the scenes, but the cataloger should NEVER have to see this unless it's explicitly asked for! But if that's what catalogers end up being given to work with, then I will really be convinced that systems vendors really do have the utmost contempt for catalogers...
</snip>
I don't know if I agree with this. With codes and numbers, everybody knows exactly what it all means. With words, it gets messy. For instance, I discovered that in ITunes U people can add metadata. http://tinyurl.com/5v3gz5p Well, if you look at it, you find a table of suggested uses for the fields and one is highly interesting:

"Name: Track title, for example, Easter Island and Darwin or Digital Storytelling"

Name???!!! And then it immediately says "Track title". And if this makes little sense to us, imagine someone with very little English trying to figure it out! Standards demand rigor and the reality is, much of it winds up being communicated in, what seems to an untrained person, to be gibberish since the purpose is to communicate very precisely.

So, while I don't care about the coding, be it in words or numbers or musical notes--it's just computer codes, after all and literally the same to the computer!--I do care very much about how people interpret those codes. For someone who sees 245$a, they will be forced to look it up and find "Title proper" which they will not understand, and then they will have to look up what a title proper is, on the way learning about alternative titles, uniform titles and all kinds of other titles that the non-librarian does not know about. After ITunes U, how will people interpret "Name"?

Yet as I said, I fought the good fight to try to get people to retain the numbered fields and subfields, but gave up. The method of communication will be in "words". One of my concerns with words is a vision I have had that each language group will eventually want to free themselves from the English language and very logically demand the equality of their own languages. Then, all these versions will be made, and the final situation will be just as bad or worse than all the versions of MARC....

Oh well, I lost that one.

Thursday, September 29, 2011

Re: Super MARC to code RDA?

Posting to RDA-L

On 29/09/2011 21:33, Jonathan Rochkind wrote:
<snip>
I am suspicious of your claim that either catalogers or programmers have any special facility with binary MARC. Certainly catalogers, but even most programmers working with MARC use pre-built libraries for reading/writing binary marc, and have no facility with it themselves. (With gratitude, because it is a _bear_ to work with).

To make sure we're talking about the same thing, here is a literal binary MARC record. Is this really something you think catalogers have any familiarity or skill to transfer? If you are looking at anything other than something that looks like what I paste below, you are not looking at the actual MARC format, but a representation of it that could be created from anything that uses the same schema/vocabulary as marc (for instance MarcXML or a marc-in-json).

This is what an actual binary marc record looks like, in the format you are talking about expanding to take for instance three-digit tags. This is not a format that anyone can write or read by hand, it is not something catalogers actually look at. This is not an error in copy and paste into this email, this is really what binary marc is (yes, it's all one line -- it's possible my email client or yours will try to seperate it into multiple lines of reasonable length. If that happens, it's no longer valid binary marc, this should be all one line).

[an ISO2709 MARC record]
</snip>
Yes, and it is also important to keep in mind that in modern catalogs, the only time this form of the record exists is when it is transferred (or communicated) from one library catalog into another. That is why they call it a "communication format". Once it is in your own catalog, it is reworked in all kinds of ways, normally into a type of relational database format. In Koha, one cell of the relational database contains the entire MARC record in MARCXML. From there, in Koha, the information is reworked into a series of indexes by a system called Zebra, which allows for the headings you see in the left column after you do a search, http://catalog.ccfls.org/cgi-bin/koha/opac-search.pl?q=rolling+stones

When someone else wants the record from your catalog, the computer recompiles it into ISO2709 format and sends it on. Someone mentioned that in one catalog, the ISO2709 format is retained. But I don't know why.

So today, this form of MARC exists for just a few milliseconds while it is compiled, transferred, and decompiled. The only software that uses it is software that is designed for libraries, not for World Wide Web  transfer of information. Browsers can't use it, spreadsheets can't use it, word processors can't use it, citation management software can't use it, that is, until it is converted into another format, such as a type of XML. MARC served its purpose and should now be laid aside.

I agree that the numbers of the fields and subfields transcend languages, but I fought that fight and lost. Developers refuse to work with the numbers and insist on the names--even in English. OK. Let them have it--they want our records and we want to share them (I hope). Let's save our strength for other battles and move forward.

Re: Technology advances

Posting to NGC4LIB

On 28/09/2011 22:00, Joe Hourcle wrote:
<snip>
And a lot of it's not in books (or journals, or other bibliographic materials), and never will be.

... but it still needs to be collected, cataloged, preserved, etc.

All of the skills of a librarian apply, it's just on something other than books.
As for the library as an organization, you still need a place to store the collected stuff. (and for the stuff I deal with, I still feel I'm closer to a librarian than an archivist, so I can't say that place is an archive, even if that's what most people call us)
</snip>
As the library's holdings become more virtual, it seems to follow logically that the library itself will become more virtual as well. If the "library-as-a-place" continues to exist, it will probably become more of a locality for people to meet, e.g. group and town meetings, perhaps also as an restful sanctuary for personal reflection; naturally it will be a place to get a decent cup of coffee.

But the idea of the library as a physical place to find information (the collection) and where I can find the answers to my questions (reference) is disappearing even now. It's amazing how quickly this has changed!

Nevertheless, I do not believe that materials can either organize themselves or that a mathematical formula can do it, no matter if just looking at that formula will make your hair stand on end and leave you speechless for a couple of days! Relying on a tool to determine something as vague as "relevance"--a tool that can be manipulated in all kinds of extremely clever ways to serve the purpose of either the greed or propaganda of unknown people (read "search engine optimization"), is really a frightening prospect.

If librarians play it right, there will be plenty of need for their skills and ethics. But I don't know--it is a very difficult time for everyone.

Wednesday, September 28, 2011

Technology advances

Posting to NGC4LIB

"Playing With Fire: Amazon Launches $200 Tablet, Slashes Kindle Prices" http://www.wired.com/epicenter/2011/09/amazon/

So, a tablet for $199 and a Kindle for $79? We are entering territory where almost everyone can get one. For the Kindle, I have compared it to average book prices (in British pounds) that I found at http://www.holtjackson.co.uk/cgi-perl/web_avg_book_price.pl. Average prices were (after conversion):   
Adult fiction:  16.15 USD
Adult nonfiction:    30.79 USD
Childrens' fiction: 10.59 USD
Childrens' nonfiction: 13.69 USD

From this, it seems that anywhere from 2.5 books to about 7 books will pay for one of these new Kindles, that is so long as you read public domain books. For a Kindle Fire, it will be anywhere from 7 books to 20, and with a tablet of course, you can also listen to music and watch movies, surf the web, plus lots more.

Prices for the ebook versions are still not much cheaper than for print--hard copy at that! But there is less and less of a hurdle to people buying the ebooks/tablets, while we can all assume the prices will go down even more. People will want to borrow ebooks from the library, but maybe Amazon will do that, too. See "The birth of the Kindle Fire and the death of the public library"  http://www.extremetech.com/computing/97335-the-birth-of-the-kindle-tablet-and-the-death-of-the-public-library?utm_source=rss&utm_medium=rss&utm_campaign=the-birth-of-the-kindle-tablet-and-the-death-of-the-public-library where the author writes,
"Amazon announced that Prime subscribers (free, two-day shipping for $80 per year) will also get free access to almost 3,000 Fox TV shows and movies — an awesome prospect for any web surfer or tablet user — but more importantly, there are tantalizing hints that Prime users will also get access to a Kindle e-book library.
Let that sink in for a moment: for $80 per year, you would get unlimited library-like access to Amazon’s e-books. That’s the cost of 10 paperback books — and ignoring the fact of whether you can read 10 books per year, let’s not forget that you also get free shipping and Fox TV shows and movies for the same $80."

Changes are taking place at a bewildering pace now. How can libraries fit in, or even keep up?

I think that there is a huge need for librarians, but the field needs to take stock to figure out what it is that we provide that is genuinely unique, and build on those strengths.

Re: [ACAT] Objection to author's birth year

Posting to Autocat

On 27/09/2011 22:34, Mary Mastraccio wrote:
<snip>
I agree. I was addressing the issue of putting non-name data in the name field.

Actually, the last time I looked, VIAF, supplies an authorized form for various countries with variants. In other words there still is a Preferred/Common-usage/authorized form but it can be different in different settings (language/country/cultural). Some have suggested that we not even specify a preferred form of name but just show all variants of a name. I prefer a common-usage/preferred form [within each thesauri] so programmers know what to display when I supply the URI in my bib record. In other words the Chinese form can be different than the English form but each language/national NAF has a preferred/authorized form which will be displayed in the local catalog based on language (or some defined) preference.
</snip>
I agree too. In fact, I think most of us are pretty much in agreement. In reply to Brian's comment:
<snip>
What exactly is it that our users want in this "new environment?" Each of us propounds the way things should be for our users with precious little evidence to support such (aside from OCLC's user study).
</snip>
I don't know. Nobody knows because there hasn't been enough research (at least outside of Google ... [et al.]) and the environment is changing so quickly anyway that probably any conclusions from any research done today will probably not be valid in just a year or two. That is what Google lives with and it seems to be the nature of the changeable times we are experiencing.

Once again, holding on to the pronouncements of RDA and FRBR seems more and more like a carpenter of the early 20th century who insists on stubbornly holding on to his hand tools when there are power everything tools coming out constantly. I understand his predicament because buying the newest power tools require costs: for the tools, for additional electricity, he has to learn other skills and so on and so on, and in addition, he knows these latest power tools that he is spending his hard-earned money for will probably be superseded rather quickly, and he will be forced to buy even more costly tools in just a couple of years, starting the cycle all over again. Yet, if he doesn't buy those power tools and learn how to use them, he remains stuck in the world of the 19th century until he either retires or passes away.

We shouldn't get ourselves into that situation.

Re: Objection to author's birth year

Posting to Autocat

On 27/09/2011 20:15, Mary Mastraccio wrote:
<snip>
The issue is that the established/authorized form includes subfields with data that is provided by other fields. The authorized form of a persons name should not include his occupation or field of activity, and does not need to include a fuller form of name or associated dates. The authorized form of a name should be just the name, with other identifying information in other fields that can be displayed as needed. If the data structure (authority record) is defined in an efficient way there will be no end of examples of systems/utilities that can index and display the information as many have suggested.
</snip>
This is correct about display, but even the concept of "authorized form" begins to disappear in the new environment. Here is Lenin in VIAF http://viaf.org/viaf/7393146/, and any, or even all, of these forms
could be his "authorized heading". The collation function of a heading (i.e. bringing related records together in a consistent manner) is being divorced from the label it carries. The VIAF example is a great case in point. Any system that is correctly configured, could easily display the Russian form, the Arabic, or whatever you would want.

But that doesn't exhaust the possibilities. Here is an example of an old catalog practice, this from the famous catalogue of the Bodleian at Oxford compiled by Thomas Hyde (who wrote down some of the first real
cataloging rules by the way!) in the "Catalogus impressorum librorum Bibliothecae Bodleianae..." of 1674. Here we see a rather remarkable heading that included the cross-references! "Rogerus Baconus, seu
Bachonus sive Bacconus" http://books.google.com/books?id=CKZFAAAAcAAJ&pg=PA59#v=onepage&q&f=false.

And the cross-reference from Bacconus to Baconus is on the previous page http://books.google.com/books?id=CKZFAAAAcAAJ&pg=PA58#v=onepage&q&f=false, while the reference from Bachon seems to be subsumed under a more general reference to the surname Bacon. Hard to say without looking at his rules more closely. (It's fabulous to see these rare materials online, and so well scanned, too!) I always kind of liked Hyde's headings because there was more information, plus you could see some of the work done behind the scenes--something that would make sense today.

There are so many ways of handling matters, and it would seem to make sense to actually ask the public. After all, aren't we supposed to be making these things for them?

Tuesday, September 27, 2011

Re: Objection to author's birth year

Posting to Autocat

On 26/09/2011 22:58, Prejsnar, Mark wrote:
<snip>
This disambiguation approach still relies to some significant extent on dates, however (which leaves the original concern unresolved). Would you want to try to be certain you were attributing a title to the *right* John Johnson if only the 2 or 3 word description of vocation were available?
</snip>

Why does it have to be *only* the 2 or 3 word description? There are almost untold possibilities available today.

I didn't mean to imply that dates weren't useful, but wanted to compare using only the dates vs. using additional information. Using only the dates made perfect sense for a long, long time but those days are gone now. At some point, as I keep pointing out, we must begin to look seriously at the catalog record, the search results, and all of the catalog's functions through the eyes of our patrons, and not just through our own eyes. While we understand how authority control works and the kind of information in each of the various records, most patrons couldn't care less about any of it. They would rather forego any interaction with the catalog at all and go straight to examining the resources. Reference librarians experience this every day when they are asked, "Where are your books on business (or art, or law, or whatever)?"

In addition, the public has far more experience of tools such as Wikipedia, Google Scholar, Youtube, ITunes, and all kinds of more specialized sites that are built on the premise of being "easy to use", which is certainly not the premise of the library catalog. It is only natural that people compare the tools they see and come to some conclusions.

At the same time, I think it's obvious that what people *believe they want* is not what they *really do want*, e.g. people really do want to search for the *concept* Dostoyevsky, but we understand that the reality of such a search is far more complex than a general layperson will think it is. All this is complex for a huge variety of reasons that librarians, and especially catalogers, will readily understand. 

People want many of the controls that a catalog provides. Still, it is vital that we look at records and search results through the eyes of people who are not expert in the catalog and have absolutely no desire, if not outright antipathy, to become experts. This is why I am so much against both RDA and FRBR, since they both continue the ancient mindset.

Monday, September 26, 2011

Re: Objection to author's birth year

Posting to Autocat

On 26/09/2011 17:38, J. McRee Elrod wrote:
<snip>
What do you do when contacted by an author objecting to the birth year in main entry? In the most recent case, there are three records in Amicus with the birth year in the entry.

So far I'm refusing on the basis of national standards, and referring her to Library and Archives Canada.
</snip>
Not only does adding birth dates to records exasperate many people, this also seems to be a good opportunity once again to bring up some better methods of resolving conflicts than what we have traditionally used. I personally find the disambiguation page in Wikipedia to be vastly superior to the traditional cataloging methods of using dates of birth to break conflicts. While it serves the purpose of keeping authors and their works separated, it does not help searchers very much. Compare e.g. "John Johnson" in Wikipedia
http://en.wikipedia.org/wiki/John_Johnson and the LCNAF, http://tinyurl.com/6yo9ys2.

The Wikipedia method of using meaningful information to describe each person as artists, military figures, politicians, etc. than the dates, e.g.
Johnson, John, 1662-1725
Johnson, John, 1706-1791
Johnson, John, 1732-1814
Johnson, John, 1759-1833
Johnson, John, 1766-1829

Patrons see and use these different methods and in this case, it's pretty easy to predict which they will prefer. Then they compare the different ways and draw conclusions as to which is better and more useful to their needs.

I think we could learn a lot from tools such as Wikipedia.

Saturday, September 24, 2011

Re: HathiTrust & five universities sued (cont.)

Posting to Autocat

On 23/09/2011 15:46, Mike Tribby wrote:
<snip>
[Aron Kuperman wrote:]
"Given the option now available to authors to directly sell their books on line as digital downloads, I suggest that any author who chooses to assert copyright (as opposed to granting permission to anyone to republish, which is easily done) has reason to object to anyone else making his book available. Indeed, book publishers might be obsolete in their current form since unless they are paying a large fee, most authors can probably make more money by selling the book in digital format - meaning they are in direct competition with anyone who wants to digitize and distribute their materials. Thus the situation is being transformed into one in which a library has the option of buying a digital copy, chooses not to do so, and then produces its own digital copy in competition with the author. No wonder the authors are not amused."
I fully agree with Aaron on this. For background one might look in on the recent discussion of this lawsuit on the Videolib discussion list, a list with a significant population of subscribers who are producers of film and video works. Their perspective on this issue is much different from that of the "Library Groups" mentioned in the article Jim cited, as it pertains to their livelihood vis-a-vis the right to control access to their works. As I may have previously mentioned to Jim both on- and offlist in the past, full-text for the asking online may be an idea that will have to wait for the fall of capitalism and the magical change in the proletariat's attitude that accompanies their ascendancy and eliminates greed and thoughts of personal gain from the human consciousness.

In the past, my colleague Bryan Baldus has suggested looking at the discussion of this issue on a couple of publishing-related email discussion lists (PubForum and Publish-L) for the perspective of some of the publishers and authors affected by this lawsuit. I assume my suggestion of looking at the Videolib discussion will result in a similar resounding silence in this august forum. And a disclaimer-- I'm not an entirely disinterested observer here. My reviews are repackaged and resold with no further remuneration to me from the publishers (in one case, ALA). Sure it's gratifying to occasionally see my name on Amazon, but a little cash would be nice, too. The company I work my day job for is prohibited from freely accessing my reviews for our website when we carry works I've reviewed.
</snip>
Yes, Mike and I have gone back and forth on this issue, mostly privately, and I continue to point out: it is becoming clearer to everyone concerned that the interests of the authors and the interests of the publishers are *not at all* the same. Much of this, but not all, is based on a change in technology that has resulted in the fact that the traditional model of: someone writes a book or article; the author sends it to a publisher who does a decreasing amount of work on the author's creations; the publisher prints and binds physical copies to send to bookstores or other similar retail outlets around the world; and people are supposed to buy whatever finds its way to their stores. This model, which made perfect sense in earlier times, is becoming increasingly obsolete today and an impediment in all kinds of ways. Publishers, who have always controlled matters, do not want this situation to change, or if it does change, they want to retain the vast majority of the power. Left only in the hands of the publishers, they have made it very clear that these out-of-print works of their authors will never be reprinted until the end of time (or when copyright runs out) and the authors will never get any more money. It is only due to of the efforts of Google and libraries that the issues are even being raised.

Especially when it comes to scholarly publications, almost none of the creators makes any money at all--that is, except for the publishers, who often make outrageous amounts of cash. I personally don't think this has  much to do with capitalism per se--I think it has much more to do with a fundamental change in technology that is causing a rapidly increasing breakdown in the traditional system of publication of printed materials, which has had less and less to do with making money for the *creators* and much more to do with enriching the publishers themselves. I don't think any of this is controversial.

If the publishers are so concerned about the authors making money, then let them prove it by printing those books again. But they choose not to. That's fine--it's their choice, but that is clearly because it is not in *their own* interests and the interests of their authors, readers, and society in general, can go hang.

Once again, copyright is not fulfilling its original purpose: "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries." There is a lot contained within this single sentence. I agree with the basic ideas it contains, but the current system of publication does not achieve it. It doesn't mean that everything has to be free, but different business models must be found and implemented. The public obviously wants these materials. The authors want to furnish them. But the publishers do not want to supply them. Each has fully valid reasons. And it isn't that the current situation just popped up overnight: it's been on the horizon for twenty years, anyway.

I still believe that a lot of money will be there for whoever comes up with a new business model, but the old one is broken beyond repair and only serves to make everyone angry. I don't see how it can go on indefinitely without becoming either a huge drag on "the Progress of Science and useful Arts", as we are just beginning to see now, or even worse, just being ignored and hurting lots of people in the process.

Friday, September 23, 2011

HathiTrust & five universities sued (cont.)

Posting to Autocat

Concerning this discussion, there is the Chronicle blog posting: "Library Groups Condemn Authors Guild Lawsuit Over Digitized Books" http://chronicle.com/blogs/ticker/library-groups-condemn-authors-guild-lawsuit-over-digitized-books/36280

Bravo to these libraries and their principled stance. Their statement "It is deplorable that eight authors and three special interest groups are trying to dismantle this invaluable resource out of a misplaced fear of the digital future," is absolutely correct. Times are changing radically and very quickly, and someone must speak for the actual "consumers" of information.

As I wrote before, I find it very revealing that the anger is focused on *libraries* who are actually trying to raise the demand for their materials, and not directed at the *publishers* who have decided against printing and selling their books.

Sunday, September 18, 2011

Re: HathiTrust & five universities sued

Posting to Autocat

On 16/09/2011 13:47, Aaron Kuperman wrote:
<snip>
But in that situation, it is very clear that the journal is aggressively asserting its rights - so by any standard, no one would have the right to digitize and distribute the article (other than with a license purchased from the publisher). Remember the issue is over "orphan" publications, not publications where the copyright owner is active and aggressively enforcing copyright.

The solution to that problem would be to publish in a journal or online forum that doesn't make such demands, and for universities to make clear that such publication qualifies under "publish or perish" rules. It is the universities that should be under attack for not accepting non-commerical publications, rather than attacking the publishers who want to make a profit (or complaining about Google trying to make a profit, which is there job as a for-profit corporation).
</snip>
You are right Aaron, I was discussing copyright more generally. Still, there seems to be a huge difference between materials on the web and materials not on the web and the reasons are not clear to me.

Nobody seems to mind that Google ... [et al.] "scarfs up" all the web documents so that others can find them. The responsibility to not be included is placed on the webmasters/creators, since they are supposed to add a "robots.txt" exclusion if they do not want their materials included in Google or other search engines--this exclusion in effect, means that no one will ever know of the existence of the website, except someone's close friends and family. This makes no sense for someone who wants to make money from their creations.

With printed materials however, the responsibility is turned around and appears to lie with Google ... [et al.], since it is up to them to try to hunt down the copyright owners before they can do anything at all. This situation could never have worked with web materials of course, since the logistics of getting permissions would make everything practically impossible. But with printed materials, which would seem to be even more difficult, it is taken for granted that the responsibility is on Google ... [et al.] to get permissions before doing anything. To me, such a situation makes no sense at all and is only a recipe for dysfunction and eventual breakdown.

At least Google was more than willing to share their profits. Too bad the agreement was not approved, but what is done is done. Once again, I say that copyright no longer serves the original purpose of "promot[ing] the Progress of Science and useful Arts" and will have to change. There is nothing wrong with this, and should be welcomed by all.

It is a great example of Jefferson's saying, written in his memorial:
"I am not an advocate for frequent changes in laws and constitutions, but laws and institutions must go hand in hand with the progress of the human mind. As that becomes more developed, more enlightened, as new discoveries are made, new truths discovered and manners and opinions change, with the change of circumstances, institutions must advance also to keep pace with the times. We might as well require a man to wear still the coat which fitted him when a boy as civilized society to remain ever under the regimen of their barbarous ancestors."
Discarding that old "coat" has relevance not only to society, but also to libraries and even its catalogs!

Friday, September 16, 2011

Re: HathiTrust & five universities sued

On Thu, Sep 15, 2011 at 11:40 PM, Aaron Kuperman  wrote:
<snip>A scholar can put a simple note on the t.p. verso granting anyone the right to republish (presumably giving him credit).  It is very easy to do. If the author doesn't do so, doesn't that mean that he is saying "you do not have permission to publish this without paying me". With the option of publishing online, it is easy to get your work published.
</snip>
For open access/open archives, that is correct, but otherwise, if you want to publish in a scholarly journal (which most people do) then 99.9% of the time, you have to sign away your own rights to the publisher. This means that even you have to get permission to use your own articles. Also, if someone wants to use your writings, any money goes to the publisher and not to you. I mean, how many scholars have gotten any money from their content in the Ebsco, Elsevier, Baker & Taylor, etc. databases? I certainly haven't gotten a penny. And those businesses are definitely making a lot of money.

Just within the last few years, publishers have been more or less forced by the scholarly community to allow authors to put one copy into an open archive. This is the purpose of the Sherpa/Romeo list http://www.sherpa.ac.uk/romeo/, which lets authors know what their own rights are, e.g. for "Library Trends" http://www.sherpa.ac.uk/romeo/search.php?jtitle=library+trends&issn=0024-2594&zetocpub=Johns+Hopkins+University+Press&romeopub=Johns+Hopkins+University+Press&fIDnum=|&mode=simple&la=en&version=&source=journal&sourceid=8599 (We discover that this is a good journal by the way!)

The development of the open access movement has been fascinating to me, and the associated responses on the part of publishers to limit access through the Google Books non-agreement, ILL, copyright issues, and so on, is just as interesting. In my view, we are witnessing historic, evolutionary changes in the information environment. Librarians, for better or worse, are pretty much forced to watch from the sidelines while the big, huge players such as the publishers, and Google type companies, work things out. 

The question is: will these big, huge players turn out to be like the dinosaurs or not? All libraries can do is to emulate our little mammalian predecessors back then: make sure we scurry away and avoid being crushed in their fights, and mainly, to adapt to whatever the situation becomes.

Re: HathiTrust & five universities sued

Posting to Autocat
<snip>
Article I, Section 8, Clause 8 of the United States Constitution, known as the Copyright Clause: "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries."
</snip>
I am not a great fan of the so-called "original intent" version of this kind of law, and I am certainly no expert, but one thing that has always struck me about the Constitution is that they went to great pains to describe why they decided to make each law. In this case, they prefaced it all with "To promote the Progress of Science and useful Arts", which seems like a positive idea to me. And because of the overriding factor to continue this "Progress", the founders wanted to secure rights to the authors and inventors.

Therefore, ensuring that authors and inventors have rights to their discoveries and writings was *not* to ensure that they would get rich, but rather to ensure progress for the society, because for authors to write, or inventors to discover, they must be properly remunerated. All this is fine and I have no problems with that.

But today, this has changed, especially for the scholarly community. If a scholar publishes an article, he or she receives precisely zero, along with the peer reviewers. You also have to give away your future rights to the publisher. Therefore, it is the publisher who is, and will continue to be, enriched by your writings. It all made sense at one time, but it must be questioned as to its actual effects today. This has been the case for a long time now. Consequently, the purpose of the original law has been turned on its head. Scholars publish articles not for the money from the article or for any rights--which they have been required to renounce--but in order to be cited. It is becoming a fact that if you put your article in an open archive free to all, the rates of your citation will go up, which only makes sense. As a result, many scholarly authors are beginning to realize that they are rewarded more by making their writings freely available.

This is one of the reasons why I think that copyright law must be reconsidered in this new universe we are entering: it is ceasing (or has already ceased) to perform its original function. The authors, at least the scholarly ones, get their rewards today not through their "rights" but through being recognized by others (or through citations). It is the publishers who are rewarded through the "original intent" of the copyright clause.

The current situation seems, at least to me, to be a very logical and almost inevitable outcome of current technology, and we will probably never return to the old ways again. Therefore, everyone becomes angry: the authors, who think they are being rooked in all kinds of ways, the publishers, who are losing the control they once had, the readers, who have all kinds of new expectations, and the librarians, who are caught in the middle of it all.

To return to the purpose of this list--of course, this all has huge consequences for libraries and catalogs, as I have tried to discuss in some of my podcasts.

Tuesday, September 13, 2011

Re: Media terms for kits

Posting to RDA-L

On 13/09/2011 20:49, Brenndorfer, Thomas wrote:
<snip>
Thomas said:
So this would work for a kit:
336 $a text $a moving image $a spoken word
337 $a unmediated $a video $a audio
 338 $a volume $a videodisc $a audio disc
...

The problem of too long, unintelligible content terms, misuse of the term "computer" for electronic as a media type,

It's 6 of one/half dozen of the other for "computer" vs "electronic". A CD player is purchased in an "electronics" store. A toaster oven is "electronic".

If anything, it's more accurate. In most common situations, someone will need a "computer" processing intermediary device of some kind to access the content on media created explicitly for such devices. That's quite straightforward.
</snip>
I have been following this thread for awhile, and from my point of view, it is a great example of the disconnect between cataloging and the public. Of course the public does not understand what "kit" means in a catalog. They never did. Do you honestly think people understand "digital" or "electronic"? Try Googling the word "kit" and see what you get: http://www.google.com/search?q=kit. Take a look at Google Trends (the latest "hot" searches) and see which terms actually mean something that people could agree on. http://www.google.com/trends/hottrends?sa=X. The one I am looking at includes "ringer" and "build" and "als". I don't know what these terms mean since they could mean almost anything.

This is the reality of the public's everyday "universe of information". The fact is: if somebody doesn't understand something today, they don't begin to question and obsess over what it really means--they just ignore it, go on to the next one and immediately forget about anything they don't understand. Let's face it: if people obsessed over understanding everything they were looking at on the web, they could never use a full-text search. That's what I do when I see some crazy thing in a search result. I don't think: "Why am I looking at this outrageous thing? Let's see. Maybe I should email the person who is responsible...." I don't think I am alone.

The words "Kit" and "electronic resource" and "text" mean something only to librarians, to us. We should admit it. There is nothing wrong with that since we need our tools to work coherently and in a guaranteed way, but otherwise, let's not think that people place a huge amount of importance in the details of our tools when they do not.

Re: HathiTrust & five universities sued

Posting to Autocat

On 13/09/2011 16:11, Roeder, Randall F wrote:
<snip>
It was bound to happen ...
http://blog.authorsguild.org/2011/09/12/authors-guild-australian-society-of-authors-quebec-writers-union-sue-five-u-s-universities/
</snip>
Thanks for pointing this out. Of course, coming from the Authors' Guild, this article is anything but unbiased. The Hathitrust people must foreseen this and have thought about it long and hard.

While I understand that the authors and publishers feel pressed, I think libraries are being pressed just as hard. Look at Harper Collin's outrageous 26 checkout limit http://www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp, and now there are serious problems with ILL for items created outside of the U.S. http://digital-scholarship.org/digitalkoans/2011/08/21/ill-impact-second-circuit-ruling-limits-first-sale-doctrine-to-works-made-in-the-us/

My experience with publishers is that they have always hated ILL and consider it, in the words of one publisher, "Interlibrary loan? We call that interlibrary theft!" And sorry, for those authors whose books have been out of print for a long time, that simple fact speaks volumes. Instead of getting mad at libraries, they should rather be thanking them for raising interest in their otherwise forgotten books. It would seem much more logical for those authors to sue their publishers to start publishing their books again. Of course, nobody wants to even mention that Amazon's most popular books are free. See "Amazon to Drop Free Books from Kindle Bestseller List" http://www.publishersweekly.com/pw/by-topic/digital/content-and-e-books/article/43152-amazon-to-drop-free-books-from-kindle-bestseller-list.html

They dropped them from their list? Somebody must have been embarrassed about that!

Obviously, "intellectual property" has ceased to fulfill its purpose and is serving only to make everyone concerned unhappy, from authors, to publishers, to libraries, to readers.

Tuesday, September 6, 2011

Re: Foreign language fields in English language records

Posting to Autocat

On 06/09/2011 17:05, James Bowman wrote:
<snip>
I am curious to know how libraries are treating foreign language 6XX fields which duplicate or supplement English language 6XX fields in the same English language record. My undergraduate library accepts OCLC records verbatim, adding only a call number if one is lacking, and appropriate local notes. Knox College has students from 31 countries in its entering Class of 2015. If you delete such fields, why?

https://i-share.carli.illinois.edu/knx/cgi-bin/Pwebrecon.cgi?DB=local&v1=1&BBRecID=230480
https://i-share.carli.illinois.edu/knx/cgi-bin/Pwebrecon.cgi?DB=local&v1=1&BBRecID=231651

I personally feel confounded by what I see but realize my era is over!
</snip>
Well, my own attitude is: if somebody with your experience has trouble, those with less experience must also. Still, I don't know if something like this actually bothers people much today. Sometimes when I do a search on Google and get one of those *outrageous* results, I think that twenty years ago, if I had been in a library and found something like that in a catalog or index, I would have raised a big stink about seeing this kind of stuff when I was doing something serious. Today, we just let these crazy results slide by.

So, with your examples, has anybody made a complaint or said anything to the librarians? I am 100% positive the patrons do not understand what is going on or why those headings are there. Maybe everybody has just let them all slide by. If nobody has mentioned it, these are the sorts of situations when I wonder: what would it take to get some kind of reaction? Would headings in Egyptian hieroglyphics or Church Slavic or Klingon get their notice? Maybe they would notice pictures of E.T. or Lady Gaga, or then again, maybe not. I don't know.

Maybe somebody should undertake this as a research project!

Monday, September 5, 2011

Re: Others' opinions of librarians (was: Fw: Re: [textualcriticism] Lorenzo Valla, Distigmai, and Trikles)

Posting to Autocat

On 05/09/2011 02:47, Michael Borries wrote:
<snip>
A short portion from another list with a non-librarian's view of the need for librarians (in this case, catalogers).

And perhaps this would be a good place to state that in forwarding Peter Daniel's post (I think it was his), it was not my intention to "bash" any particular library, but both that post and this are meant to show that others outside the library are concerned about what we do. I think if we "dumb down" too much, we may loose support in some academic circles.
<snip>
The reason for the error is simple (which again shows that GB needs to slow down a bit and cooperate with good librarians): three books are bound together in one volume; the data in GB belong to the third entry (BH MSS 41-3), but what you see is the first (BH MSS 41-1)
</snip>
</snip>
This reminds me, once again, of the Language Log post, "Metadata Train Wreck" http://languagelog.ldc.upenn.edu/nll/?p=1701 and the associated discussion in various places. Here is a newer post from Scholarly Kitchen "The Terrible Price of Free: On E-reading Jane Austen via Google's Ebooks":  ttp://scholarlykitchen.sspnet.org/2011/03/14/the-terrible-price-of-free-on-e-reading-jane-austen-via-googles-ebooks/

The real complaint in both of these postings however, concerns accuracy in its various guises in cataloging records or OCR, rather than some of the "higher matters" of Web2.0 and 3.0. The post you mention seems to be similar. While I completely agree that these matters are highly important, from the cataloging point of view, matters of accuracy are considered rather elementary. In the Language Log discussion, he does mention "classification" e.g. how some editions of  "Jane Eyre" are classified under "Architecture" or "Antiques & Collectibles", but in neither article is there a discussion of difficult issues of name authority control or uniform titles, let alone more complicated subject analysis.

Still, the need for simple accuracy is apparently becoming important to the average reader and therefore, the inventory aspects of a library catalog seem to be consider more important than previously. This is certainly a positive development.

My concern however, is that these are the easiest sorts of problems to fix and I will venture, rarely need the talents of a professional cataloger. Are searchers equally concerned about lack of personal name authority controls, or of corporate names, or subjects? I would hope so, but I fear the very idea is becoming lost among people--that is, if they ever had it to begin with.

To Be or Not To Be...Opinionated

Comment to To Be or Not To Be...Opinionated By Leah L. White, Library Journal, Sept. 1, 2011

Interesting article, but there are several topics here. The first question about Sarah Palin, was very possibly a trap. I wrote in an email list/blog post of mine that "I think it is really important *not* to believe that all librarians represent a single political ideal since they neither represent a single political entity, nor should they. The moment they do speak out politically, they can become isolated by someone, somewhere, and at the same time they alienate a large number of the members of their own profession." http://catalogingmatters.blogspot.com/2011/08/re-day-made-of-glass.html

Still, it is not hypocritical for librarians to say they are important to society. It is merely stating a fact, but the general public may not be aware of this fact. Naturally, librarians should be required, and able, to back up their statements if questioned--and that is the entire point. When confronted, they can begin to explain why, hopefully in language the average person can understand and appreciate, and these are the times when people may listen and librarians may have an impact. But, when arguing for the importance of libraries, librarians are also ethically compelled to provide the "non-librarian" view as well, in an unbiased manner. This makes us quite different from many other members of society.

Being unbiased does not necessarily mean the same thing as keeping silent when you see someone doing something that will harm them and you. This certainly applies to information. There is still a moral responsibility to speak up.

Friday, September 2, 2011

Re: How Google makes improvements to its search algorithm

Posting to NGC4LIB

On 31/08/2011 15:17, Jimmy Ghaphery wrote:
<snip>
I am fascinated by the notion of imprecise or custom search results and the way in which it challenges our expectations in the libraries.

An important aspect to the appropriateness of fuzzy results is the characteristics of the underlying data. In the case of Google we are talking about a huge data set that can at best be loosely corralled. In this context, using additional data such as usage patterns and geographic location of the searcher makes perfect sense to me. For a scientist searching a genomic database, it makes sense that results need to predictable and repeatable.

It is not crystal clear to me where library data might fit along this continuum. Considering the potential scope of the next generation catalog I do think we need to embrace notions of rich algorithms and rapid iteration to tease out relevant results. In reality our results change every day that we add records (sometimes radically if we are bulk loading). How scientific do we need to be here? Do we entertain requests for a researcher who wants to see results from our previous system or the results we presented from a search even a year ago?
</snip>
I remember this news report from the BBC where, because of the various tweaks, Google keeps losing a city in Florida and the consequences to the people living in that town! http://news.bbc.co.uk/2/hi/programmes/world_news_america/9038870.stm. (When I read a story like this, I often "teleport" back in time 25 years
mentally and try to imagine what I would think. I would find this one completely incomprehensible!) I sent a post to Autocat http://catalogingmatters.blogspot.com/2010/09/disappearing-cities.html where I discussed my own views, and there was a short dialog.

One suggestion for fitting in library data was made by Eric Hellman in a talk at ALA, that I mentioned in another post to Autocat, which provoked more dialog. http://comments.gmane.org/gmane.education.libraries.autocat/40227. To make sure that I was not misinterpreting him, I wrote him and he got involved too, in another thread http://comments.gmane.org/gmane.education.libraries.autocat/40267.  Essentially, he was saying that in the future, people would very rarely interact with library metadata as they do now (i.e. looking at catalog records), and that it would be used more as "microdata" http://en.wikipedia.org/wiki/Microdata_%28HTML5%29 behind the scenes, resorting and reworking search results, or Search Engine Optimization. I mentioned the Google Books project with all of its metadata, that most people probably don't even know about, but there has to be a lot going on behind the scenes there.

There is a very definite role for library metadata in the future. I personally think it has to do with ensuring a level of standardization to guarantee that Google's misplacing of towns doesn't happen because of the inevitable tweaks. Also, it becomes clearer and clearer to me that people really don't like to interact with the library's catalog--how it works, how it looks, even what it is, the catalog is becoming a strange thing for the average person. I think Hellman is onto something and may be on the right track toward a solution. Seen in this sense, the Google raters example may prove invaluable.

Re: How Google makes improvements to its search algorithm

Posting to NGC4LIB

On 30/08/2011 20:04, Joseph Montibello wrote:
<snip>
Jim W. wrote:
Google does not allow any kind of "guaranteed" or "standardized" access--just the opposite. If the results vary for you and me, and even vary for ourselves depending on where we are searching from, plus it is tweaked almost twice a day, I think the public could possibly understand the argument for a more standardized means of access.
I think personalized is better, from the perspective of most patrons. If you're doing research in medicine, you probably want to privilege recent stuff over older stuff. However, this doesn't mean that the metadata needs to be personalized. The underlying data needs to be standardized, but that doesn't mean the presentation of the data (including search result ranking) should be one-size-fits-all.

Why does Google tweak their algorithm constantly? Lots of reasons, I'm sure, and not all of them would be comforting to us. But I do think that they've shown an ability to produce useful results. So I'd argue against aiming at standardized access for all patrons. Returning personalized results sends a message to the patron - "we're trying to help you." In many cases, our standardized results tell the patrons "We think we have the answers, and one of those answers is that there's a whole skillset that you need to learn before you can do what you thought you wanted to do."
</snip>
Yes, thanks for clarifying that for me. Patrons should be able to work as they wish with the search results, but the results themselves, at least a part of them, should be standardized in some way to permit guaranteed access, i.e. a search that worked yesterday should work today and tomorrow as well.
<snip>
The best part of the video was its emphasis on big-time systematic testing and evidence-based decision making. One guy mentioned that for every time a certain feature didn't work, they wanted to be sure it worked 50 times. I suspect there's no sound reason for that ratio, it's just a practical line in the sand that they can shoot for.

How can we get that kind of production testing?
</snip>
It takes a lot of resources and control over your own systems. A single, rich corporation like Google can do it, but for a diverse, loosely-organized group such as librarians, it would be much more difficult. Related to your previous comment, I think it's important to show our patrons that we are *trying* to improve matters *for them*, and that means there will be experiments, of which some might fail. Although failure is not such a great thing, I think the general populace understands that nothing is perfect and everything can be improved. That's how Google etc. work, and perhaps that is the lesson we should take: gradual, tiny improvements.

Re: Justification of added entries

Posting to RDA-L

On 30/08/2011 23:04, Heidrun Wiesenmueller wrote:
<snip>
It might be worthwhile taking a look at cataloguing conventions used outside the Anglo-American world: According to the German "Rules for alphabetical cataloguing", we've added relator terms for persons such as "editor" or "translator" for some 30 odd years - only we call this information "function designators", and the list of possible designators is much shorter than those in RDA. I suppose it would be possible to extract this information from German catalogues and add it to the corresponding records in Anglo-American databases. This might be a starting point for enriching AACR2 legacy data with this kind of information.
</snip>
and
<snip>
I don't think people have forgotten the concept. I'd argue they never knew what it was in the first place, and I also believe that it would not be fair to demand that users of catalogues struggle with that kind of thing - this is *our* job. It's also *our* job to design systems for authority control which work in keyword searches. In German catalogues, this is no big deal: It simply doesn't matter whether the preferred form or a variant form of a name is entered as a keyword - you always get the same title list as a result. And, of course, this also works for subject headings. The reason why German catalogues don't have a problem here is due to the different data model used in our systems: Instead of typing in authorized forms as a text string, we create links between the title record and the authority record for the person, corporate body, or subject heading in question (using the ID number of the authority record). This means that not only the preferred form is available for indexing, but the variants as well. Actually, it's been puzzling me for some time why American librarians seem to be simply putting up with the fact that an essential tool of our trade does not work with keyword searching in their systems. Shouldn't there be crowds of librarians demonstrating in front of the offices of ILS suppliers, demanding that a technical solution be found for this problem?
</snip>

Of course, I agree. There are so many technical solutions available today for problems that had been seen at one time as practically insoluble. This demands a different mindset however, one that agrees to really give up a certain amount of control and to trust in the information from other initiatives. This brings me back to the idea of genuine, enforceable standards as a basis for that trust, but I have written about this already several times.

The only point I might take exception on, and it is a small one, is I think people did grasp the *concept* of authority control in a card/printed environment because it was the only way it functioned. When there was no possibility of keyword searching and all you could do was browse, if you wanted something by Johann Wolfgang von Goethe, you could look under "Johann" and would find nothing or almost nothing. One hoped that you would then either ask for help or decide to look under "V" (would anybody look under "W"?!) to find a cross-reference to "Goethe" where you would then walk over and see everything arranged very nicely. People could, even subconsciously, get an idea that there was a real organization to it although they did not understand it. The complexity was clear to them however. Maybe they didn't have much of an idea of BT, NT, RT because often those cross-references were in a separate place. Today, all of that structure has gone unused with keyword, at least in US catalogs.

Ultimately, this is an argument only of historical importance since I think everyone agrees that people just do not understand it today. Apparently, even searching under surname is being lost! Plus, we need to separate out what we can do today from what people genuinely want and need. This is also called creating a valid business model. Certainly, there are some things that we can do today that we couldn't before, but it still doesn't follow that we should do those things *if* they are not what people want and need. That would be throwing away our resources on pointless tasks. Basic, traditional authority control (or conceptual searching, as I have tried to explain it to patrons) is something that I think people really do want, but it must be made to function in today's world, and this means to somehow interoperate with full-text and other types of databases using different thesauri.

After getting things to work more or less are they are supposed to, then there will be plenty of time to change the cataloging rules.