|
Germanic Lexicon Project
Message Board
|
|
|
Author: Keith Briggs
Date: 2004-11-09 09:43:19
Subject: Re: Probabilistic correction
More tests, to show the effect of increasing the size of the training corpus:
Train on Caesar's Gallic Wars:
> dbacl -0 -q4 -l latin ../Latin/gall*
> db="dbacl -c c -c latin -vN"
> echo "ge-helmian ; p. ode, ede; pp. od, ed" | ${db}
c 100.00% latin 0.00%
> echo "To cover with a helmet, crown;" | ${db}
c 100.00% latin 0.00%
> echo "galeSre, coronare" | ${db}
c 32.41% latin 67.59%
> echo ":-- ÐÚ gehelmodest us" | ${db}
c 100.00% latin 0.00%
> echo "corSnasti nos," | ${db}
c 100.00% latin 0.00%
> echo "Ps. Spl. 5, 15. Of wuldre and weorþmynt ðú" | ${db}
c 100.00% latin 0.00%
> echo "gehelmedest hine" | ${db}
c 100.00% latin 0.00%
> echo "de gloria et hondre coronasti eum," | ${db}
c 99.03% latin 0.97%
> echo "de gloria et honore coronasti eum," | ${db}
c 38.45% latin 61.55%
> echo "Ps. Spl. T. 8, 6. Gehelmod " | ${db}
c 100.00% latin 0.00%
> echo "gáleátus, Ælfc. Gr. 43; Som. 45, II." | ${db}
c 100.00% latin 0.00%
> echo "[Laym, i-helmed : O. H. Ger. gehelmot.]" | ${db}
c 100.00% latin 0.00%
Add Annals of Tacitus:
> dbacl -h18 -0 -q4 -l latin ../Latin/gall* ../Latin/tac*
> db="dbacl -c c -c latin -vN"
> echo "ge-helmian ; p. ode, ede; pp. od, ed" | ${db}
c 100.00% latin 0.00%
> echo "To cover with a helmet, crown;" | ${db}
c 100.00% latin 0.00%
> echo "galeSre, coronare" | ${db}
c 58.05% latin 41.95%
> echo ":-- ÐÚ gehelmodest us" | ${db}
c 100.00% latin 0.00%
> echo "corSnasti nos," | ${db}
c 100.00% latin 0.00%
> echo "Ps. Spl. 5, 15. Of wuldre and weorþmynt ðú" | ${db}
c 100.00% latin 0.00%
> echo "gehelmedest hine" | ${db}
c 100.00% latin 0.00%
> echo "de gloria et hondre coronasti eum," | ${db}
c 98.23% latin 1.77%
> echo "de gloria et honore coronasti eum," | ${db}
c 17.58% latin 82.42%
> echo "Ps. Spl. T. 8, 6. Gehelmod " | ${db}
c 100.00% latin 0.00%
> echo "gáleátus, Ælfc. Gr. 43; Som. 45, II." | ${db}
c 100.00% latin 0.00%
> echo "[Laym, i-helmed : O. H. Ger. gehelmot.]" | ${db}
c 100.00% latin 0.00%