Wednesday, January 4, 2012

Cataloging Matters #13: Thoughts on Open Development

Cataloging Matters #13:
Thoughts on Open Development

Hello everyone. My name is Jim Weinheimer and welcome to Cataloging Matters, a series of podcasts about the future of libraries and cataloging, coming to you from the most beautiful, and the most romantic city in the world, Rome, Italy.

In this episode, I want to discuss something a little different: I have already talked about open archives, (part 1, part 2)  but there are lots of other types of open. Here, I want to concentrate on some of the technical aspects of open source development and how people have managed to get these kind of projects under control. Finally, I would like to suggest a possible future where the cataloger can play a major role.

Before I begin, I would like to dispense with a point of grammar. In this podcast, I shall use the term “open” as opposed to the term “openness.” I realize that using the simple term open is rather awkward, but I do so to emphasize its difference from openness, which is similar but has additional meanings attached to it. This follows what I did in another of my podcasts where I maintained that search is actually quite different from searchingI will return to this later.

And by the way, there are links to everything I discuss in the transcript. 

What does OPEN mean? We seem to come across this term more and more often: open source, open education, open systems, open government, open relationships, open season, open mike, open marriage... The list can go on and on. I do not intend to analyse all of these types of open here, which are quite varied, but the sense in which I want to discuss it is that open is in reality a philosophical view, and this view can be shown in many levels of our society, not only within computers and their systems. 

Nevertheless, to get a clear initial idea of what open means, it is probably easiest to think of it in terms of computer software, and to discuss the differences of open source and freeware vs. proprietary software.

Here we go! 

Proprietary software is the simplest to understand: a program was created by Microsoft or Apple and is their property. Although you have bought your own copy, those companies still retain several rights. No one can change the copy of that program or do anything with it except what the owners allow. These programs may even be free to install and to use, such as ITunes, but mostly you have to pay for them, as we do for Microsoft Office. The basic point is that if you would like the program to work in ways other than the owners allow, you must ask them to make the changes and they can choose simply to ignore you. Even if you have the technical knowledge, you are still not allowed to change the program.

Have you ever wondered how these owners prevent people from working on the copies of proprietary software that developers have in the privacy of their own homes? If you owned a copy of a book, you could take it apart to discover how it was made and so on. But you can’t do this with a software program. But let’s face it--who is going to know if you do it at home in your study anyway? After all, the companies aren’t spying on everybody--are they? No they aren’t, but they do something else: they do not give you the source code.

What is source code? The source code is what computer programmers write using special languages such as Java or C or C++ or perl or php or any of the other programming languages that are completely incomprehensible to the layman. But what’s more important is that these languages ARE comprehensible to an expert. Still, the source code on its own will not get the computer to do anything because to the computer, the source code is just as incomprehensible as to the layman. The only thing computers understand is what is called machine language, that is, all those 1s and 0s of binary code. Therefore, after the human programmers write the source code, they must use a special program to convert the source code into machine code.

With proprietary software, you receive only the machine code (that is, the 1s and 0s) and not the source code. While it is theoretically possible to reverse engineer the source code from the machine code, it is a massive amount of work, and it is illegal to do so anyway. Looked at this way, the source code for a program is similar to the Rosetta Stone, which allowed Champollion to decipher Egyptian hieroglyphics. Before the discovery of that stone, everyone was pretty much helpless.

This is what people mean when they talk about open source as opposed to closed source: open source makes the source code publically available and closed source does not.

Both freeware and open source are very similar, yet there are still some subtle differences between the two. What are those differences? Not very much, as it turns out. The real differences are actually more on the philosophical side, and the final products are essentially the same.

Freeware concentrates on the idea that the software is not free as in free kittens or free beer, neither of which are without costs to you or somebody somewhere. Even though the software is free to download and to use, it will need some resources, such as a server, someone to do maintenance and so on. Freeware concentrates on a different meaning of the word free, considering it free as in freedom, that is, you do not have to ask permission from anyone to edit the program to function however you want it to, for whatever reasons you might have, and then to share it with anyone you wish.

Open source software means the software that you can download for free and install legally, can look at the actual source code, and you are free to edit it for your own purposes, plus to share your own version, if you wish. Therefore, open source software is focused on the process.

So today, open source software concentrates on the advantages of so-called crowdsourcing to develop the software, with the result that you can have developers quite literally all over the world working 24/7 on the software in all kinds of ways. Freeware is more ideologically opposed to proprietary software, which disallows anyone, except the company who owns it, to be able to change a software program in any way, even if you do so only for your own purposes and do not share it with anyone at all.

The subtle difference can probably best be understood in the titles of two of the important works on this topic. Richard Stallman, who was one of the very first people to begin this way of thinking, entitled his work “Free as in Freedom” and he describes how the freeware movement (freeware in this context now takes on a different meaning) began with an encounter he had with a feisty Xerox printer! This book is available for free on the web

One of the basic tenets of freeware is “The freedom to run the program”:
“The freedom to run the program means the freedom for any kind of person or organization to use it on any kind of computer system, for any kind of overall job and purpose, without being required to communicate about it with the developer or any other specific entity. In this freedom, it is the user's purpose that matters, not the developer's purpose; you as a user are free to run the program for your purposes, and if you distribute it to someone else, she is then free to run it for her purposes, but you are not entitled to impose your purposes on her.” (From the GNU philosophy
Quite a statement when you think about it. To get a better understanding of this mentality, you might want to watch the films Tron and especially Tron Legacy!

The other major book is The Cathedral and the Bazaar by Eric Raymond, which discusses how freeware/open source products can be built: from the bottom-up (the Bazaar) where the process of development is open for everyone to see and even participate in, or from the top-down (the Cathedral), which makes only the final product open and free to all but not the process of development. The Linux operating system is built on the Bazaar model and is a great example of a success. Raymond's book, by the way, is apparently what convinced the Netscape Corporation to release its code and allow development of the Mozilla project with Firefox and Thunderbird and other programs, which now follow the Bazaar model of development. For anyone who hasn't used Firefox, it is probably the best browser available and Thunderbird is a great email program. If you haven't worked with them, especially Firefox, I suggest that you try them. After all, it's free and if you don't like it, you haven’t lost any money and you can use something else. That is one of the beauties of freeware.

There are advantages and disadvantages to both methods of open source development: the Cathedral model can be very efficient and new releases can be brought out rather quickly when only a few people are involved and work closely together. The Bazaar approach could only be successful once the networked information web we have today was created. Now, many people from around the world can be involved in a single project. The basic philosophy can be summed up in the statement “given enough eyeballs, all bugs are shallow”, which means that the more people who work on a program, the quicker that any problems can be discovered and solved than where there are only a few people.

It is obviously difficult to manage such huge projects with participants from literally anywhere, so developers created software programs specially designed for open-source development (these are also available as freeware) and can bring order to what would otherwise be a completely unmanageable situation. One of these programs has the curious name Bugzilla, which allows people to file problems and to track the progress of how computer bugs are being corrected. ( There are lots of other programs for doing this however.

An example of the Bazaar model is Firefox, since people can involve themselves in the development process, while Android, the operating system for mobile devices owned by Google, is an example of the Cathedral model since Google shares their code only after it has been developed.

The Cathedral and the Bazaar is also available for free on the web

What Happens When There is a Difference of Opinion?
What happens when various groups want the software to develop in different ways? Several solutions are possible in these situations. One way is to have a kind of guru, who is generally accepted as such, so that he or she can just make the decision. Linus Torvalds, who developed the Linux kernel, sometimes does this with Linux. Another way of solving this problem is through voting and the bug-tracking software can handle this.

But what if these solutions fail and the disagreements are just TOO far apart? This is when something called a fork may occur and although ideally, forks should be seen as good developments since it allows different communities to do as they wish, the reality is completely different and some have compared forks to religious schisms. Forks occur when different groups want to develop different pieces of software; in other words, the groups will no longer cooperate and the software each group creates will no longer be able to exchange their code. Therefore, it is a huge decision fraught with many responsibilities to embark upon a fork. Linux has forked many times in many ways. In the transcript, I provide a link to an image that shows the timeline of the forks that have occurred in Linux. Many lasted only a short time.

One recent example of a fork is OpenOffice, which has been free software for around ten years and developed by Oracle according to the Cathedral model. Many, including myself, find it better than Microsoft Office for their own purposes. It turned out that some of the developers got concerned that OpenOffice was not being developed actively enough by Oracle and in 2010, they created a fork called LibreOffice to ensure development. They asked Oracle to participate in the project but Oracle was not at all happy about it, refused to participate and demanded that the fork be shut down. Those developers continued and wound up creating their own organization called “The Document Foundation”.
(OpenOffice and LibreOffice

LibreOffice is now adopting the Bazaar development model but the founders of LibreOffice maintain that LibreOffice actually is OpenOffice but since they cannot use the OpenOffice logo as it is owned by Oracle, they changed the name—kind of.

To get around many of the pressures to create new software forks, the main programs are becoming more flexible. Today there is the possibility of creating add-ons, plugins or extensions in various ways. These are pieces of software that cannot be run independently, whereas a fork is an independent program. Add-ons are designed only to enhance another program and will not run on its own. I won't discuss these in detail since there are so many different kinds for so many different programs, but let’s just say that they have become wildly popular and there are hundreds if not more, for all kinds of programs. As a rule, add-ons are much simpler to make than altering the source code of an existing program, and the developer can remain relatively independent, and yet because add-ons are still based on the main program, there is also the danger that your add-on may become inoperative when the main program is updated. So, anybody who has used Firefox with add-ons has discovered upon updating Firefox, that half of your add-ons no longer function and need to be updated as well.

Since add-ons are so popular, some proprietary programs allow them--programs such as Internet Explorer. But dependence on proprietary software is by definition even more difficult since because of the closed development model, the developer of the add-on can do nothing until after the new version of the proprietary program is released.

Consequently, we can see that those who create add-ons assume a major responsibility to keep current with the programs they are dependent on.

Wild West?
From this discussion, it may seem as if open source development is like the wild West of the 19th century, and that what we really need is a tough sheriff who will take charge, clean up the town, and bring some bit of order to it, someone like Gary Cooper as Will Kane in High Noon.

While that may be true, there is a different way of looking at it: imagine we are in one of those nasty little western towns we see in the movies, where everything and everyone is owned by the local cattle baron who spends his time drinking and gambling in the brothel he owns, and the little homesteaders (i.e. developers) are almost helpless in the face of all that wealth and power (imagine Microsoft or Apple or Oracle or whatever company you prefer in the role of cattle baron). Since the cattle baron is interested only in adding to his own money and power, he must stifle all competition wherever he sees it. In this sense, open source seems more like Alan Ladd as Shane, who saves everybody from the cattle baron, and each homesteader can become a genuinely productive citizen.

Alan Ladd thereby cleans up the town in quite a different way from how Gary Cooper does it.

By the way, if you haven't seen these movies, you should do so. 

I confess that much of this is of limited interest to people who are interested only in using the programs—except that the open source/freeware ones cost them much less money, if anything at all! What does this have to do with catalogs?

Open Source Library Catalogs
First, there are several free, open source library catalogs available. Now we understand that they are not free in the sense of free puppies, but they are free in quite another way. This, I think, brings something that is literally brand-new, or at the very least well-forgotten, to the library cataloger community. For a long time now because of the nature of the library catalogs we have purchased, catalogers have been told how to do their work. You ask: “How do I add a serial issue?” The answer is that in proprietary library catalog X, you add a serial issue this way. In proprietary library catalog Y, you add a serial issue this other way. In other proprietary catalogs, it may be different. It doesn’t matter if you like any of those ways or not, or what you think about them. It is their way or the highway, and there is no discussion. Sure, the library can ask the companies to change something, but we know where that normally leads. 

With open source library catalogs, it is completely different. While there is usually a default method to do anything, such as adding a serial issue, when dealing with an open source library catalog, the real answer to a question such as “How do I add a serial issue?” should be “How would you like to do it?” Sometimes, a change you suggest can be quick and easy, while at other times, it can be more intensive. You can hire out a computer science student to do the work if you don’t have the expertise internally. But the amount of labor and costs are beside the point. The main thing is: you can change it, that is if you want to. This is the freedom that freeware promises.
As we have seen, there are all kinds of methods for making a program, or in this case, a library catalog, work the way you want it to: from changing the code, to simply adding some links, or creating an add-on. You can even take out your entire catalog and put it into another software program, as the eXtensible Catalog does, at least as I understand it. (

The only limit is your own imagination, but this is easier to say than to genuinely accept. Open source catalogs can be quite different from proprietary catalogs--they don’t have to be but they can be--and it takes some time to get used to, but once you do, it opens new possibilities and is quite liberating.

It can even be incredibly creative.

One of the basic assumptions that I haven't seen mentioned anywhere in all of the discussions of open source is: for any of it to work, people themselves must be open, and here I want to explore openness. By this I mean that the information you share with others who are working on similar projects must be truthful and honest, and consequently, you must share your progress and your successes, as well as your setbacks and your failures. Information about failures is essential.

For open source developers who are private companies, much of this may be considered to be sharing business secrets. But even for individuals, such openness can be exceptionally difficult and especially for the organizations they work for. 

Although people have been taught from the time they were children that they should share their problems and their failures along with their successes, there is something within us that rebels against such openness. I have never met anyone who enjoys admitting failures or pointing out inadequacies within himself or herself. When the pressures and competitions found within organizations and between them are added into this scenario, it can be very difficult indeed to find openness. For instance, I have had many private emails from librarians around the world who tell me of the difficulties within their organizations, or between competing—oh! Excuse me! That should be cooperating organizations. They tell me the problems, but there is almost always the proviso that I can use the often invaluable information they give me, but only so long as they and their organizations remain anonymous.

It turns out that many of the problems these people are facing are in essence, the same everywhere except for the details: similar problems with systems, similar friction among competing--excuse me again!--cooperating divisions, similar problems of understanding and so on and so on.

Of course, I will always keep everything anonymous when asked to do so. I completely understand the need, but I will state that it is truly unfortunate there is such a need to do it. The result is that officially, all matters appear to be under control, but in reality, individuals often suffer from a tremendous lack of information and at times this may lead them to believe that he or she is the only one facing these difficulties. This makes them feel inadequate, they may consider themselves to be failures, when often, almost every other organization is dealing with the same problems, yet there remains this need to pretend to the outside world that everything is OK.

Associated with this is something very interesting I have discovered that has to do with people in academia, who very often have tenure or its equivalent. Of course, one of the major justifications of tenure is to ensure open discussion since those with tenure are not supposed to worry about losing their positions if they say something they believe sincerely, but happens to be unpopular with upper echelons. In this sense, openness is codified by tenure itself.

Naturally, matters are rarely so simple and it turns out that many who work in academia and are protected by tenure-type securities still do not feel very secure when speaking their minds.

There is no need to go into this any further. Let me state openly that I am very much 100% pro-tenure and wish it could be extended far more widely outside of academia, but nevertheless, tenure still has its faults. The lack of openness even for those with secure positions, something which appears so perplexing on the surface, is truly an unfortunate consequence and retards progress.

Why am I talking about Open Source development in a Discussion on Cataloging?
The reason I am discussing open source development is because it is a genuinely new business model (or as I suspect, it is more correctly a rediscovery of a very old model, but that is still another topic I won't discuss at this moment). The fact is, open source development has a very active history both of proven successes and of failures, along with a huge number of problems accompanied by all kinds of solutions that have shown themselves to be more-or-less successful.

It is my feeling that by using something very similar to open source development, especially following the Bazaar model, we could come up with new cataloging rules, or an open cataloging standard. If many--very many-- librarian/metadata creators were involved in the development of these standards, and matters were correctly worked out, plus I suspect, by using the add-ons concept which would allow the development of a very basic core set of rules, while different specialized communities, such as legal, cartographic, theological, musical, Slavic languages, along with other communities I cannot even imagine right now, could be “added-on” in some fashion, we could build comprehensive rules branching out from the core to include many, many communities.  

Could cataloging rules be built on such an open model? There are already several examples of ”open standards” (although there are several definitions of what that means). Wikipedia has a number of open standards listed.

Most of these standards deal with computer standards of coding, but I see no reason why the basic model could not be extended to other tasks such as cataloging rules or perhaps even to other types of standards as well. The main ideas are that everyone--and I repeat everyone--can participate equally in the development of the standard if they wish. This is something new.

Even the equivalent of add-ons for each specialized community could evolve into associated or subordinate standards. Marvelous tools such as Skype, presentation software, Google Translate, and even that weird Second Life are available, which allow for unparalleled international cooperation today. Of course, the final product would be free to use, to consult, and download for each person’s own purposes. Yes, each of these could be changed as well, but I would hope that if matters were set up correctly, all could work together, at least to a point.

Does this envision a wild west environment, or one of newfound freedom? I have my own opinions, but each person would have to answer that question individually.

The facts are: We know that these development models and tools have succeeded in the past and they continue to succeed today. They can help to create some extremely important projects in the world, but for these models to work, there are certain responsibilities required, primary among them, a willingness to cooperate in genuine ways—not only passively, and not only in words, but in deeds. As Linus Torvalds has said, “Talk is cheap, show me the code.”

Can librarians do something like this? That this kind of a system could succeed, I have no doubt because there is such a record of success in other projects. Whether it would succeed is an entirely different question of course. Could the Cooperative Cataloging Rules be a part of such an open standards system? I would hope so but I would happily see it all forgotten if something better and genuinely useful came along.

Open is Not Good for Everything
In a recent article in the London Review of Books, Jenny Diski talks about modern publishing practices of literature that closely resemble the open model in her article titled either Short Cuts or The Future of Publishing, I can’t figure out which is the title. The author describes how in modern literary writing, websites are often set up today where the authors post drafts of their novels, people comment on those drafts (sometimes the people pay for this opportunity), and the authors can change their drafts based on the comments from their readers.

This can also be described as an open-source development model for a novel. As the author says,

“Unbound [one of these websites] suggests itself as a radical move away from commercial publishing, but instead of an alternative, it’s the concentrated essence of marketing. No one is taking any risks or making a leap of faith. This is a crowdsourcing model that is as crowd-pleasing as populist publishing, but on a smaller, safer scale. Readers control what the authors can write. In the past, libraries and bookshops were places you went to to find excitement. The excitement Unbound offers is that of a horse-race with a chance to feel up your horse’s fetlocks before it runs.” That’s it for now. Thank you for listening to Cataloging Matters with Jim Weinheimer, coming to you from Rome, Italy, the most beautiful, and the most romantic city in the world.
Although I too, have a natural desire for money, this article makes clear that when it comes to personal, creative efforts, the open model can be carried much too far. At least it does for me. 

To end this podcast, made during a winter in central Italy, I can think of no better music selection than from Vivaldi’s Four Seasons, the famous first movement from the fourth concerto, the one labelled “Winter”.

It turns out that Rome gets a lot of rain in the winter, but very rarely gets any snow at all. When I see satellite photos showing all of Italy blanketed under snow, sometimes rather deep, while the little area around Rome is clear, I always think that those old Romans really knew what they were doing when they chose this spot for their city.

This recording is from the Internet Archive, where you can find different versions of Vivaldi’s Four Seasons in their entirety. This is a recent performance, from November 3, 2011, by the Wichita State University Chamber Players.

That’s it for now. Thank you for listening to Cataloging Matters with Jim Weinheimer, coming to you from Rome, Italy, the most beautiful, and the most romantic city in the world.

1 comment:

  1. Interesting episode; Was hoping you would discuss more about your experiences with Koha and your thoughts on other Open Source ILSes like Evergreen, etc. Thanks for the backstory on the split between OpenOffice and LibreOffice, I did not know the details of that.

    Small tech note, the audio on this one is weird; your voice track was only reaching me on the left channel but not the right--but for the movie clips the audio was fine on both channels (which startled me).