|
Germanic Lexicon Project
Message Board
|
|
|
Author: Dieter Studer (University of Zurich)
Date: 2006-03-04 08:35:34
Subject: Re: acute versus circumflex
Thanks for your reply!
I found a paragraph in Bosworth-Toller in a0001.png concerned with the acute. They write:
'With regard to the marks used to distinguish difference in the vowels it may be noticed that eá, eó are employed in all cases where the short ea, eo are not meant, e.g. sceóp, Goth. skóp, has the same symbols as leóf, Goth. liubs, etc.'
They are not very specific about the 'difference in the vowels', but I assume they mean hiatus.
> This question has come up before. The answer I've given is that Bosworth and Toller intended them to all be acutes, and that there's simply random variation as to how much the acute slants, probably because the typesetter didn't have enough of these somewhat unusual acute characters from a single run of pouring metal into the same mold.
I quickly checked page 0257, and the distribution of accute and circumflex seems to be consistent.
> Your post raises a new twist, namely the possibility that it's acute for hiatus and circumflex for length. If this is true, then at the very least, Bosworth and Toller aren't consistent about it, because I've found cases where there were multiple tokens of the same word within a single entry with this same substantial variation in the slant of the acute (or free interchange between acutes and circumflexes, to state it as you're identifying it). This is why I judged there not to be a meaningful distinction.
Could you tell me the name of the lemma where this free variation takes place?
> If you're right and I'm mistaken, however, then it's going to be a real major project to fix it at this point. I didn't train the OCR program to distinguish the two, and the hand corrections are more than halfway done. So I propose that we go on as we have rather than try to fix it in mid-stream (if a fix is needed). If we study this more carefully and it turns out that there's a real distinction, then probably, the least bad solution would be to do some kind of programmatic solution in the final clean-up after the hand corrections.
I totally agree that it is not advisable to change the correction routines at this point, and if there really is a meaningful distinction in the two diacritics, it should be fixed in the final clean-up.
> Actually, if the distinction is real, it could be somewhat fortuitous that it turned out this way. The graphical distinction between the two (if it's real) is so subtle that the OCR program would have never been able to distinguish them reliably and would have given essentially random results. It would have been an equally bad problem during hand corrections. So if it's a real distinction, then this might be one of those cases where a programmatic solution would probably give more accurate results than either OCR or human hand correction.
I guess you're absolutely right, the replacement by aid of regular expressions will be far more accurate than any hand correction.
Thanks again for your great work. It's a real pleasure to browse through this homepage!
Didi