Hello everyone. My name is Jim Weinheimer and welcome to Cataloging Matters, a series of podcasts about the future of libraries and cataloging, coming to you from the most beautiful, and the most romantic city in the world, Rome, Italy.
In this program, I would like to discuss Open Archives, what they are, and are they important; in short: what should we do with those things?
For those who aren’t sure what an open archive is: it is nothing really special, it’s just a computer database created and maintained by an organization, probably academic or professional, that aims to provide digital resources to the public for free. This database very possibly will conform to some standards from the Open Archives Initiative. http://www.openarchives.org/ Although the saying, “Information wants to be free” is not true since somewhere, somebody has to pay something, with open archives, the people who use the materials are not the ones who pay. This is the essence of open access, which is a method of publishing. Open archives are not public in the sense that anyone can place materials in them, like anyone can upload a video into Youtube, http://www.youtube.com or a document into Scribd http://www.scribd.com/ An open archive accepts materials only from members of the community it belongs to, for example, a university open archive allows their local faculty and/or students to add their materials, or the high-energy physics one at Cornell http://arxiv.org/, allows only recognized physicists to add their materials. E-LIS is an open archive for the profession of librarianship http://eprints.rclis.org/. I have added some of my own articles there myself. (By the way, links to all of these sites, as well as to everything else, are in the transcript) Finally, an open archive should have some kind of organization behind it, since a project of a single person can easily disappear if that person can no longer afford it, becomes ill, or just gets tired of it.
What makes open archives open is that anyone is supposed to be able to access the materials inside the archive without payment. So, we see that open archives are one way of promoting open access. Naturally, this is not taken to very kindly by many publishers, who believe that they are losing out on a lot of cash, or at least they claim that they do. The authors of scholarly materials however, who get no financial return from the publishers, or at the very most a pittance, are beginning to discover the advantages of this open access regime, since they have learned that when they place their materials into an open archive, they can increase their citation rates significantly. Of course, being cited is the main way that scholars get their rewards, through promotions or they may even get head-hunted by another institution.
It’s important to note that not everyone agrees that materials in open archives are cited more often. In the transcript, I provide links to some of those articles. I’m not going to discuss them here, but to be honest, these disagreements are based on statistical technicalities and fail to convince me. We should not be comparing long-established, respected peer-reviewed journals to an article placed here or there in a random open archive, since I see that as a completely different argument. To me, it only makes sense that an article available for free online with a click has a much better chance of getting cited than those that make people unlucky enough to have a library without a subscription, pay $30 or $40 for it, or go through the hassle of an ILL. Arguing against something that seems so obvious should be backed up exceptionally well. Peer-review enters into this debate and we’ll discuss it later in this program. http://scholarlykitchen.sspnet.org/2010/01/07/citation-advantage-for-mandated-open-access-articles/; http://scholarlykitchen.sspnet.org/2011/02/08/oa-citations-spurious-relationship/]
The business model for open archives is a little different: instead of being designed for organizations to make money, it is aimed at saving money. For open archives to work, everyone involved is required to share their materials as widely as possible. In this sense, the model is same as copy cataloging: a significant number of institutions must be willing to create and share their catalog records, otherwise if nobody shares their records, the model will obviously fall apart. Creating and maintaining an open archive costs a significant amount of money, and if only one or two people, or only a handful of organizations make their resources available, it will not justify the costs. The more that everyone makes their resources available through open archives, the more everyone will benefit. Naturally, there are serious concerns about all the different aspects of quality and we’ll discuss these issues in depth later as well.
But imagine for a moment an ideal situation, where all scholarly resources were placed into open archives. In such a world institutions could save untold scads of money because the budget that would normally go to buying materials from publishers could be switched to maintaining the local open archive, which has costs but would be far cheaper. Such a level of cooperation and openness will probably not happen anytime soon, if ever, but the main idea is to understand how much could be saved by freeing each institution from the necessity of buying their own copies of the same journals, the same books and the same everything else, while at the same time, we could provide everyone with many more resources than ever before.
This kind of model could never have worked in a physical, printed information environment, but it can work in the virtual environment. For example, all that would be needed(!) to make the the open access movement complete right now, is for the Elseviers and Ebscos and Springers and all the other publishers to simply remove the controls on their information and let everyone access the materials there for free! It would actually be a lot easier for them, but I’m not holding my breath since I don’t think they will do that anytime soon!
Growth of Open Archives
When I first learned about open archives, I was very skeptical of them, yet it seemed safe to assume that these materials would grow. Gradually, it dawned on me that the number of materials available to our patrons was growing at what appeared to be an exponential rate, and therefore, the pressure on cataloging and our catalogs would necessarily grow in similar fashion. It seemed to me that if library catalogs did not rise to this challenge by not including these materials, our patrons would find our catalogs less and less useful because library catalogs would then be giving access to a rapidly decreasing percentage of the information that was really available to them. That seemed to be a path to oblivion, and it still does, at least to me.
Moving ahead a few years, there were some tremendously important decisions: first, came Elsevier’s decision to allow authors to upload copies of their articles published by Elsevier into an open archive, (when I read the news, I couldn’t believe it at first!) http://www.elsevier.com/wps/find/editorsinfo.editors/editors_update/issue8, then came the declaration by Harvard http://hul.harvard.edu/news/2009_0901.html, and other universities that their faculties should place a copy of every article they publish into a local open archive. Then, university presses began to put their backlists into open archives. Just to mention a couple of the biggest are the presses of the University of Michigan and the University of California. [University of Michigan, http://www.press.umich.edu/digital/hathi/ and University of California http://publishing.cdlib.org/ucpressebooks/]. If properly followed up, I believe such decisions could prove to be game changers. The financial disaster that we have all been dealing with for a few years now, plus the statistical trends of open archives seem to suggest that I was correct in my assessment that they will continue to grow, perhaps at an exponential pace. Significant amounts of money must be saved and this is one way it can be done.
In the transcript are some statistics from ROAR (the Registry of Open Access Repositories) and OpenDOAR (Directory of Open Access Repositories). I’ll summarize the results.
ROAR (Registry of Open Access Repositories) Database
(Growth: http://roar.eprints.org/cgi/roar_graphic?cache=426449 [the graphic must be generated].
ROAR shows a growth from 2003 of the number of repositories from around 250 to over 2000 today, while the number of records has gone from around 200000 in 2003 to over 9 million today!
OpenDOAR (Directory of Open Access Repositories) Database
A similar growth rate can be seen in OpenDOAR, which shows less dramatic growth than ROAR, but still, from mid-2006, there were around 400 repositories while today there are almost 2000. Not all open archives are listed in these initiatives, but it seems safe to conclude that open archives are popular to create.
Problems with Open Archives: Aspects of Quality
Of course, not all is sunny with Open Archives and they do pose many challenges. I mentioned the various types of quality earlier, and one concern is the quality of the information found in open archives. Often, the materials in the open archives have not gone through peer-review but I would like to point out that this is not necessarily such a bad thing. (I assume that everyone understands what peer review is here, but just in case, I provide a link to a wiki page I made for students http://aurlibrary.wetpaint.com/page/Scholarly+Publication+Process)
I would like to relate a story that could be described as either famous or infamous, depending on your point of view. John Mearsheimer of the University of Chicago and Steven Walt of Harvard University, co-wrote a paper for the Atlantic magazine titled “The Israel Lobby and U.S. Foreign Policy”. The article was turned down by that magazine, so the authors decided to place it into the Harvard open archive, where everyone could download and read it. http://belfercenter.ksg.harvard.edu/publication/3082/israel_lobby_and_us_foreign_policy.html
The topic of their article was highly controversial and it began to get noticed; it was formally published in journals such as the London Review of Books and Middle East Policy; it wound up being published as a book, all the while provoking a major dispute in the general press. Here we can see how useful open archives can be when dealing with highly controversial topics and we can also see that there can be definite problems with traditional peer review which can occasionally act as a censor. As this incident shows very clearly, there is a post-peer review process and it is carried out all over the “information universe”. Post-peer review can be much more effective and far more interesting for everyone concerned than pre-peer review. [http://futureofscipub.wordpress.com/open-post-publication-peer-review/]
A post-peer review system can work in various ways: it can be embedded within the open archive much as the talk section works in Wikipedia http://en.wikipedia.org/wiki/Talk:The_Israel_Lobby_and_U.S._Foreign_Policy. In the transcript I provide a link to an example article available at the Public Library of Science (PLoS), which has post-publication comments and ratings. http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.1001030
A post-peer review system can also exist separately; especially worthy of note is the remarkable site Faculty of 1000 http://f1000.com/, where experts evaluate medical articles, provide reviews and some controls, and rank them anywhere from “Must read” to something less. Apparently, people like the selection and ease of this site very much since they must purchase subscriptions–and please note–not for the content of the papers, but for the expert selection and reviews.
One of the main criticisms of post-peer review is that it has been demonstrated scholars normally don’t participate: most never rate or review anything at all, and the question is: Why don’t they? One of the answers suggested has been that there are no incentives to participate, but I reply that scholars have never received any payment for peer review. My suspicion is that in the normal peer review system, when someone asks you to do a review, you are rewarded automatically: after all, you have received recognition from colleagues who have taken the trouble to single you out from other possible candidates, and all they want is your opinion. That can be quite flattering. This important type of reward does not happen with the post peer review systems that I have seen, and something else needs to be devised, perhaps a more tangible reward system, including being able to put the reviews on your resume, general recognition in your profession for good reviewing, or something similar.
Systems can be built that make it clear that an article has not been reviewed at all and therefore needs reviewing, or has received primarily negative reviews. I personally believe that the so-called “double blind” system where the author is not supposed to know the reviewers, and the reviewers are not supposed to know the author, does not work and reviews should be both signed and open. This could also be incorporated into a system that would give prominence to articles with signed reviews over anonymous ones. There really are many, many options.
Still, no matter how people feel about post-peer review, the genie truly is out of the bottle: no matter how people feel, articles, books and other resources will be placed in open archives, many will be consulted and commented upon somewhere, and there either will be a system that collects these subsequent comments and reviews, or there will not be a system. Of course, some articles and resources will never be read or commented upon but the more important materials will be discussed and those discussions can appear anywhere. The existence of a system to collect the discussions in some way would benefit everyone. For me, the question is not whether post-peer review works or not–it’s going on right now and has been going on for millennia–the task is to make a system that works the way scholars and librarians need it to work. For instance, if I came across that article by Mearsheimer and Walt in the London Review of Books http://www.lrb.co.uk/v28/n06/john-mearsheimer/the-israel-lobby, I would want to know about the criticisms and responses; not only those few within the London Review, but in all kinds of other sites as well.
No matter what is decided, in my opinion an open archive needs to include some kind of peer review mechanism, whether it is pre or post. Versioning is also important so that authors can add updated versions of their papers based on comments they receive.
Finding Materials in Open Archives and Metadata
After all these preliminaries, I finally get around to cataloging. Although this has been rather roundabout until now, I thought it was just as important to discuss the quality of the materials housed in an open archive as the quality of document description and retrieval.
It has been much more difficult to find OA resources than so-called “regular” resources because they fall outside the normal bibliographic workflows (you won’t find them in Ebscohost or Elsevier). Publishers don’t make ONIX records for them because nobody makes any money at it. After all as I mentioned earlier, publishers are businesses with the purpose of making money, while an open archive is not designed to make money. But aside from this, it is important to emphasize that the problem is not a lack of metadata: whenever authors place a resource into an open archive, they must provide a “metadata record” for it, which normally includes the authors’ name or names, the title of the resource, some keywords perhaps, and so on, but mostly this information is not standardized in any way. How does this turn out?
Whenever I have heard repository managers discuss their open archives, of which they were very proud, the question of metadata invariably comes up but there are precious few details that you get as to the quality of that metadata. I have found a question of the sort: “What is the quality of the metadata in your repository?” most often meets with the reply, “Pretty good”.
What does that mean?
To be fair, for non-specialists in the field, it is very difficult to understand the need for shared standards or that shared standards even exist. In fact, I was once discussing this very problem with the head of a repository and mentioned that in library catalogs, maintaining consistency is the overriding factor because that is the only way of providing a reliable result. This means that whether or not you happen to agree with a certain practice, such as the way a name heading has been set up, you cannot just make up a form you like better and stop there. You either must follow what has been done before, or you are forced to change all previous occurrences of the former name to the new name heading, which can be very difficult, and practically impossible in a networked environment. He did actually come to understand the importance of consistency, and when he did, he mentioned something I thought very perceptive. He said that in his opinion, when scholars create metadata for their own articles, they view it in one of two ways: either as completely boring and unimportant, or as extensions of their own creativity. He saw that both approaches are incorrect for the purpose of ensuring reliable search results.
I personally think that it is just as unfair to expect untrained persons to create metadata records that follow shared standards as it is to expect people who have never worked as mechanics to change a radiator on their car or replace a windshield. Non-experts do not have the training, the tools, or in many cases, the interest to do a competent job, and there is absolutely nothing wrong with that. But is there a solution?
From this point on, I plan to do some analysis of metadata and the complexity of this discussion will go up, so perhaps I have gone on long enough at this point and shall stop for now. In the next episode, I shall analyze what “pretty good” means in practice, if it actually is “pretty good”, and what role catalogers could play in the task of getting these materials under control.
The music I have chosen to end this programme is the Second movement, the Adagio, of Albinoni’s Oboe Concerto in D minor, Op. 9, No. 2 published in 1722. This was performed by the Sarre Radio Chamber Orchestra and I think is a wonderful performance of this famous piece, although unfortunately, it’s from a vinyl record and you can hear a scratch for a few seconds. For fans of library history, most of Albinoni’s unpublished music was held in the Dresden State Library when that city was destroyed in the firebombing of 1945, therefore much of his work is gone. http://www.youtube.com/watch?v=qSAJ1yuBozA
That’s it for now. Thank you for listening to Cataloging Matters with Jim Weinheimer, coming to you from Rome, Italy, the most beautiful, and the most romantic city in the world.
See also: Open Archives pt. 2