Germanic Lexicon Project
Message Board
Home
Texts
Search
Messages
Volunteer
About
[ Main Message Index ]   [ Previous | Next ] [ Reply ] Author: Keith Briggs
Date: 2004-11-08 12:10:56
Subject: Re: Probabilistic correction
What about something like this:
0. We build dbacl models of English, OE (correctly accented - are such available?), and Latin (plenty available e.g. http://penelope.uchicago.edu/Thayer/E/Roman/home.html) from corpora of those languages, *not* from the corrected BT files.
1. We strip all punctuation and html markup from BT files we want to check and replace á by á etc.
2. Using a language guesser (e.g. the trigram code here http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/326576), we identify maximal groups of adjacent words all in the same language.
3. We check each of these groups using the appropriate model built in step 0.
?
Keith
Messages in this thread Name College/University Date Probabilistic correction Keith Briggs 2004-11-04 05:41:11 Re: Probabilistic correction Keith Briggs 2004-11-04 07:49:10 Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-04 22:42:53 Re: Probabilistic correction Keith Briggs 2004-11-05 05:31:16 Re: Probabilistic correction Keith Briggs 2004-11-05 06:59:54 Re: Probabilistic correction Keith Briggs 2004-11-05 07:29:53 Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-05 09:32:30 Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-05 09:48:16 Re: Probabilistic correction Keith Briggs 2004-11-08 05:07:19 Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-08 09:12:45 Re: Probabilistic correction Keith Briggs 2004-11-08 09:46:59 Re: Probabilistic correction Keith Briggs 2004-11-08 10:02:13 Re: Probabilistic correction Keith Briggs 2004-11-08 12:10:56 Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-08 15:26:04 Re: Probabilistic correction Keith Briggs 2004-11-09 06:47:45 Re: Probabilistic correction Keith Briggs 2004-11-09 08:50:46 Re: Probabilistic correction Keith Briggs 2004-11-09 09:43:19 Re: Probabilistic correction Keith Briggs 2004-11-09 10:59:49 Italics (was: Probabilistic correction) Sean Crist Swarthmore College 2004-11-09 13:39:13 Re: Probabilistic correction Keith Briggs 2004-11-11 06:57:20