|
Germanic Lexicon Project
Message Board
|
|
|
Author: Sean Crist (Swarthmore College)
Email: kurisuto at unagi dot cis dot upenn dot edu
Date: 2004-10-12 14:06:08
Subject: Re: M-Dash Spacing & Eth for D-Bar in BT?
> It seems the appearance of spaces before colons and before and after M-Dashes
> is deceptive in most cases, rather a matter of typesetting than punctuation. Should
> I assume that unless otherwise apparent (as according to normal rules of
> orthography and punctuation) no space should fall before colons and before
> and after m-dashes? Often this appearance of what is in most cases a
> (half-)space is registered on the OCR page as full space.
I've been treating :-- as its own separate "word" which gets a space both before and after. Think of it as a special separate symbol which marks a particular boundary within the entry.
The reason is that this makes text processing a lot easier. If the spaces aren't there, then the preceding and following word are all joined into one unit. For example, when the index for the Search system gets rebuilt, we don't want to index river:--Be all as one word. Of course, one possible solution is to complicate that program so that it knows to split this into river and Be. But, processing the text word-by-word needs to be done pretty often, so this complication would have to be handled again and again (as with the ð to ó program, or the part of the Search system which hilights words in red, etc.).
If someone wanted to typeset the whole dictionary without the spaces there, it would be easy to remove them with a simple search-and-replace.
> Also I am assuming that nobody else is using for eth in BT, though
> the character employed is certainly the former.
Yeah, you're right. Some texts do distinguish from ð (Torp does, for example, which is why and ð are distinguished in the character list). I haven't seen any cases where BT makes that distinction, so their use of amounts to a mere graphic variant. We're not losing any contrast between characters by using the more widely used ð character.
You ask really good questions, BTW.
--Sean