Posting to open-bibliography
It looks like a wonderful project you are working on and should help a lot of people. And it’s very interesting that we are seeing a return to the old scribal traditions of the pre-printing days, a tradition that has been forgotten, where today we take for granted that with additional editions (1st, 2nd, 3rd etc. editions) we end up with successive *improvements* of the text, we should remember that this idea of “improvement” has only been with us since the introduction of printing. Before printing, it was entirely turned around: the idea that the farther away from the original text we were, the more *corrupt* the text becomes, because of all the errors in hand copying. Therefore, in a world with all manuscripts the task was to recreate the original version. Amazing how that is being transferred to the situation of today!
My own opinion is that OPMV http://open-biomed.sourceforge.net/opmv/ns.html is a step in the right direction, and there are places to describe where, when and how changes took place and who did them, I see no provision there for detailing *what those changes were*, which is what people really want. To get an idea of a traditional manuscript collation, there is a good discussion at: http://www.skypoint.com/members/waltzmn/Collations.html with good examples, http://www.skypoint.com/members/waltzmn/Collations.html#Samples. This same basic method is used for early printed books as well although they also consider how the book was put together.
The traditional library determinations of an edition/manifestation that I pointed out before are based much more on changes in the physical item than in the changes in the text, i.e. if the transcription of the title page, formal edition statement, dates (within certain limitations), physical paging, and series statement is all the same, it is *assumed* to be the same edition/manifestation and is therefore handled as an item. But of course, the text inside could be slightly different or even completely different because librarians do not have the time to compare texts so thoroughly. The opposite is the same as well: if something on the title page, dates, etc. is *different* it is considered a new edition even though the text may be completely the same. (This has resulted several scams by unscrupulous publishers, by the way; plus it happens more honestly with US vs. UK publications) In librarian terminology, this is called “content vs. carrier”. Library tradition, under pressure of productivity, has almost always concentrated much more on carrier.
Naturally, the traditional collation methods of manuscripts cannot be used on web materials and I think the librarian emphasis on carrier also does not serve well in a digital environment. Still, the final product of a manuscript collation can be pretty nice, since it details the changes very clearly. Modern tools can recreate these things automatically, e.g. in the Wikipedia History pages http://tinyurl.com/33khhjj where you can select any versions you want and the changes are displayed very clearly.
In your case, could you do something similar to the Wikipedia history page by doing file compares or something?