|
Germanic Lexicon Project
Message Board
|
|
|
Author: Sean Crist (Swarthmore College)
Email: kurisuto at unagi dot cis dot upenn dot edu
Date: 2004-10-12 13:40:13
Subject: Re: Global corrections
> Sean et al.
>
> Hm. I found the eth for oacute to be common in some environments, specifically
> "mód" (doing bt_b0225, -26 right now). These were
> reserved yesterday.
Yeah, that doesn't surprise me; I know that the program doesn't catch every single case. My main worry is to make sure it didn't introduce some other kind of problem which I didn't notice.
In the case of eájj-mðd, for example, the program wasn't able to correct the error because there is a second error in the word (jj should be þ). The program couldn't find the (nonexistent) word eájj-mód in the Toronto DOE corpus, so it didn't change the word.
Things like hyphens and other stuff can be a problem. The program already handles a lot of stuff like upper/lower case, stripping off punctuation at the beginning/end of the word, etc. I could make it still more complicated to handle hyphens and things; but the more complicated the program, the greater the chance of accidentally introducing some other kind of error.
--Sean