Germanic Lexicon Project
Message Board

Home

Texts

Search

Messages

Volunteer

About


[ Main Message Index ]     [ Previous | Next ] [ Reply ]

Author: Sean Crist (Swarthmore College)
Email: kurisuto at unagi dot cis dot upenn dot edu
Date: 2004-10-12 13:40:13
Subject: Re: Global corrections

> Sean et al.
>
> Hm. I found the eth for oacute to be common in some environments, specifically
> "mód" (doing bt_b0225, -26 right now). These were
> reserved yesterday.

Yeah, that doesn't surprise me; I know that the program doesn't catch every single case. My main worry is to make sure it didn't introduce some other kind of problem which I didn't notice.

In the case of eájj-mðd, for example, the program wasn't able to correct the error because there is a second error in the word (jj should be þ). The program couldn't find the (nonexistent) word eájj-mód in the Toronto DOE corpus, so it didn't change the word.

Things like hyphens and other stuff can be a problem. The program already handles a lot of stuff like upper/lower case, stripping off punctuation at the beginning/end of the word, etc. I could make it still more complicated to handle hyphens and things; but the more complicated the program, the greater the chance of accidentally introducing some other kind of error.

--Sean

Messages in this threadNameCollege/UniversityDate
Global corrections Sean Crist Swarthmore College 2004-10-06 14:10:54
Re: Global corrections Matthew Carver 2004-10-12 13:05:35
Re: Global corrections Sean Crist Swarthmore College 2004-10-12 13:40:13
Re: Global corrections (JJ) Matthew Carver 2004-10-13 21:42:11
Re: Global corrections (JJ) Sean Crist Swarthmore College 2004-10-14 17:22:19