Germanic Lexicon Project
Message Board
Home
Texts
Search
Messages
Volunteer
About
[ Main Message Index ]   [ Previous | Next ] [ Reply ] Author: Keith Briggs
Date: 2004-11-05 05:31:16
Subject: Re: Probabilistic correction
> The thing that would really save us the most work is something that actually makes corrections in individual words automatically. Even it corrected just, say, 10% of the errors, that would translate into a big savings of effort with the hand-corrections.
> --Sean
I did some more tests. I trained on 30 c and u files. The classification of an unknown file seems to be correct every time. So I went to the next stage - trying to classify individual lines of an unknown file. This was not good enough to be useful - some lines with errors were classified as correct, and conversely. So to go to the level of individual words is not likely to work.
After all, this is a probabilistic approach, so a large enough chunk of text is needed for statistical analysis to make sense.
But there is plenty more I can try. For one thing, ostensibly correct files that I used for training may actually contain errors. Then, I should really train on ALL uncorrected files so the system sees as many errors as possible.
Sean: would you be able to send me a gzipped tar file of these? (it's too slow to get them separately by hand).
Keith
Messages in this thread Name College/University Date Probabilistic correction Keith Briggs 2004-11-04 05:41:11 Re: Probabilistic correction Keith Briggs 2004-11-04 07:49:10 Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-04 22:42:53 Re: Probabilistic correction Keith Briggs 2004-11-05 05:31:16 Re: Probabilistic correction Keith Briggs 2004-11-05 06:59:54 Re: Probabilistic correction Keith Briggs 2004-11-05 07:29:53 Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-05 09:32:30 Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-05 09:48:16 Re: Probabilistic correction Keith Briggs 2004-11-08 05:07:19 Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-08 09:12:45 Re: Probabilistic correction Keith Briggs 2004-11-08 09:46:59 Re: Probabilistic correction Keith Briggs 2004-11-08 10:02:13 Re: Probabilistic correction Keith Briggs 2004-11-08 12:10:56 Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-08 15:26:04 Re: Probabilistic correction Keith Briggs 2004-11-09 06:47:45 Re: Probabilistic correction Keith Briggs 2004-11-09 08:50:46 Re: Probabilistic correction Keith Briggs 2004-11-09 09:43:19 Re: Probabilistic correction Keith Briggs 2004-11-09 10:59:49 Italics (was: Probabilistic correction) Sean Crist Swarthmore College 2004-11-09 13:39:13 Re: Probabilistic correction Keith Briggs 2004-11-11 06:57:20