Germanic Lexicon Project
Message Board

Home

Texts

Search

Messages

Volunteer

About


[ Main Message Index ]     [ Previous | Next ] [ Reply ]

Author: Keith Briggs
Date: 2004-11-08 12:10:56
Subject: Re: Probabilistic correction

What about something like this:

0. We build dbacl models of English, OE (correctly accented - are such available?), and Latin (plenty available e.g. http://penelope.uchicago.edu/Thayer/E/Roman/home.html) from corpora of those languages, *not* from the corrected BT files.

1. We strip all punctuation and html markup from BT files we want to check and replace á by á etc.

2. Using a language guesser (e.g. the trigram code here http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/326576), we identify maximal groups of adjacent words all in the same language.

3. We check each of these groups using the appropriate model built in step 0.

?
Keith

Messages in this threadNameCollege/UniversityDate
Probabilistic correction Keith Briggs 2004-11-04 05:41:11
Re: Probabilistic correction Keith Briggs 2004-11-04 07:49:10
Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-04 22:42:53
Re: Probabilistic correction Keith Briggs 2004-11-05 05:31:16
Re: Probabilistic correction Keith Briggs 2004-11-05 06:59:54
Re: Probabilistic correction Keith Briggs 2004-11-05 07:29:53
Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-05 09:32:30
Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-05 09:48:16
Re: Probabilistic correction Keith Briggs 2004-11-08 05:07:19
Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-08 09:12:45
Re: Probabilistic correction Keith Briggs 2004-11-08 09:46:59
Re: Probabilistic correction Keith Briggs 2004-11-08 10:02:13
Re: Probabilistic correction Keith Briggs 2004-11-08 12:10:56
Re: Probabilistic correction Sean Crist Swarthmore College 2004-11-08 15:26:04
Re: Probabilistic correction Keith Briggs 2004-11-09 06:47:45
Re: Probabilistic correction Keith Briggs 2004-11-09 08:50:46
Re: Probabilistic correction Keith Briggs 2004-11-09 09:43:19
Re: Probabilistic correction Keith Briggs 2004-11-09 10:59:49
Italics (was: Probabilistic correction) Sean Crist Swarthmore College 2004-11-09 13:39:13
Re: Probabilistic correction Keith Briggs 2004-11-11 06:57:20