On 06/06/2011 20:12, Myers, John F. wrote:
<snip>This does not agree with my experience. I have seen computer programs using regular expressions that deal with far more complex issues than this. Differentiating specific characters from specific digits is done every single day. The way the information appears in these fields is highly controlled, as you point out with the c or copyright symbol, the use of brackets, hyphens and question marks. There are also a limited number of terms used, e.g. before, or, between, probably amounting to a couple of handfuls of words.
The solution Mac proposed, of using copyright symbol/lowercase 'c'/term 'copyright', would not be effectual from a data standpoint. The desirable outcome is to cleanly record the copyright date as a numerical value, actionable by the machine as a number. This is not possible if it remains coupled in the 260 $c with date of publication data that can appear as a string of digits recording the transcribed date, a possibly incomplete string of digits in brackets for a conjectured date, or textual indication that there is not a date provided. There is also the complexity of the three textual/symbolic options to prefix the copyright date itself. Both sets of conditions require a machine to treat the characters we see as numbers largely as a textual string that can be displayed but not acted upon.
I would almost bet the baby's shoes that this could be done well enough to provide something useful, that is, far more useful than expecting everybody to change all their practices, or just saying that it's too complicated.