Germanic Lexicon Project
Message Board

Home

Texts

Search

Messages

Volunteer

About


[ Main Message Index ]     [ Previous | Next ] [ Reply ]

Author: Keith Briggs
Date: 2004-11-05 05:31:16
Subject: Re: Probabilistic correction

> The thing that would really save us the most work is something that actually makes corrections in individual words automatically. Even it corrected just, say, 10% of the errors, that would translate into a big savings of effort with the hand-corrections.
> --Sean

I did some more tests. I trained on 30 c and u files. The classification of an unknown file seems to be correct every time. So I went to the next stage - trying to classify individual lines of an unknown file. This was not good enough to be useful - some lines with errors were classified as correct, and conversely. So to go to the level of individual words is not likely to work.
After all, this is a probabilistic approach, so a large enough chunk of text is needed for statistical analysis to make sense.

But there is plenty more I can try. For one thing, ostensibly correct files that I used for training may actually contain errors. Then, I should really train on ALL uncorrected files so the system sees as many errors as possible.
Sean: would you be able to send me a gzipped tar file of these? (it's too slow to get them separately by hand).

Keith

Messages in this threadNameCollege/UniversityDate
Probabilistic correction Keith Briggs 2004-11-04 05:41:11
Re: Probabilistic correction Keith Briggs 2004-11-04 07:49:10
Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-04 22:42:53
Re: Probabilistic correction Keith Briggs 2004-11-05 05:31:16
Re: Probabilistic correction Keith Briggs 2004-11-05 06:59:54
Re: Probabilistic correction Keith Briggs 2004-11-05 07:29:53
Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-05 09:32:30
Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-05 09:48:16
Re: Probabilistic correction Keith Briggs 2004-11-08 05:07:19
Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-08 09:12:45
Re: Probabilistic correction Keith Briggs 2004-11-08 09:46:59
Re: Probabilistic correction Keith Briggs 2004-11-08 10:02:13
Re: Probabilistic correction Keith Briggs 2004-11-08 12:10:56
Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-08 15:26:04
Re: Probabilistic correction Keith Briggs 2004-11-09 06:47:45
Re: Probabilistic correction Keith Briggs 2004-11-09 08:50:46
Re: Probabilistic correction Keith Briggs 2004-11-09 09:43:19
Re: Probabilistic correction Keith Briggs 2004-11-09 10:59:49
Italics (was: Probabilistic correction) Sean Crist Swarthmore College 2004-11-09 13:39:13
Re: Probabilistic correction Keith Briggs 2004-11-11 06:57:20