On 16/10/2012 23:04, Deborah Fritz wrote:
Putting all the legacy MARC records out of our mind, it is *really* important to record this information, if you want to be able to, someday, reliably machine map the data in the new RDA/MARC records you are creating into a new encoding format that is used for data recorded under RDA principles.
Yes. If the purpose is to turn our text into data, there should be some level of reliability so that the people who mine our data can be assured of at least some level of reliability in the search results. To expand on the earlier analogy to use the code “Avocados” or “Green” when you are looking at a list of relators that do not fit your own (as will happen very often particularly when cataloging non-English language materials), here is another classification from the National Agricultural Thesaurus: http://agclass.nal.usda.gov/mtwdk.exe?k=default&l=60&s=1&n=1&y=0&w=avocados&t=3
: : : : : : : Economics, Business and Industry
: : : : : : products and commodities
: : : : : agricultural products
: : : : : Plant Science and Plant Products
: : : : plant products
: : : fruit products
: : fruits (food)
: tree fruits
Of course, all catalogers are taught, even in library school, that when assigning subjects, it is vital to assign them at the greatest level of specificity. Therefore, if you have a book on “avocados” it is incorrect to assign the heading “agricultural products” because doing so devalues both the subject “avocados” and “agricultural products”. (The principle of “specificity”) So, someone looking for “avocados” should be able to retrieve the book on avocados without knowing that they have to look under “agricultural products” and someone looking at “agricultural products” should not be having to sort out books specific to avocados (along with those of other fruits and plants, etc.).
This is especially true in a networked, union catalog where even if your own library has no other works within this entire hierarchy, a book on “avocados” should still be assigned that specific subject and not “Economic, Business and Industry” because your record will be searched along with those of all of the other libraries that have lots of books in all of those topics. This would lead to breakdown in the union catalog. Naturally, there is also the understanding that a library collection is a growing entity and in a few years, your collection could also contain a large number of works on each one of these subjects and the book you are doing today should fit into that future collection as well as possible.
The purpose of all this labor is to provide the searcher with a level of reliability when searching by subject, otherwise it becomes more a matter of luck whether a search for “avocados” or “agricultural products” will provide a result that is meaningful or not.
Should such concerns also apply to relator codes? (And as Deborah pointed out, let’s set aside the rather messy problem of the so-called “legacy data”) When some researcher in the humanities wants to “data mine” our records for e.g. “writers of added text” what should he or she expect to find? How can researchers get results that they can actually rely upon?