Germanic Lexicon Project
Message Board

Home

Texts

Search

Messages

Volunteer

About


[ Main Message Index ]     [ Previous | Next ] [ Reply ]

Author: Ondrej Tichy (Charles University, Prague)
Email: ondrej dot tichy at gmail dot com
Date: 2009-06-13 23:07:30
Subject: Re: Converting Bosworth/Toller into Unicode and XML

Hi Boaz,

sorry for a late reply and thanks for your interest in the GLP. If you are still interested in participating, would you let me know by e-mail (ondrej.tichy@gmail.com)? That would make the communication a bit simpler!

Cheers,

Ondrej

> Hi,
>
> First of all, I'd like to thank everybody here for your efforts in working on this dictionary. I really find it useful.
>
> Right now, I'm trying to generate a dictionary for offline use from the Bosworth/Toller master file (the most recent version, if I judge correctly, is bt_canon_3.txt). The file format is far from being trivial to process (not being proper XML and all that), but I more or less managed to go through that, and I think I can create a script that will almost automatically convert it into proper XML.
>
> As the work on the dictionary seems to be ongoing, the question is - should I really do a fork? Wouldn't it be more productive if the master file itself gets converted into a new well-formed XML format, conveniently sorted by entries (and not by pages)? If this is considered a goal, I'd be glad to help, since I'm already doing that thing anyway.
>
> The second major change I'll have to do is to convert the generated dictionary into Unicode. The master file uses a slew of confusing non-standard SGML/HTML/XML-entities, some of them are redundant and some of them are just plain wrong (lacking a terminating ";", or being misspelled like "&actue;" instead of "´"). I've already made a simple script to fix all that, and normalized everything as proper Unicode, using combining diacritics where necessary. I still haven't gone through logically ordering the Hebrew characters (they're visually ordered right now, although the diacritics are not), but I think that's gonna be simple.
>
> Would any of this work be useful to the project? This can easily solve the problem of not being able to view Greek or Hebrew entries.
>
> Cheers,
> Boaz

Messages in this threadNameCollege/UniversityDate
Converting Bosworth/Toller into Unicode and XML Boaz Yaniv Hebrew University of Jerusalem 2009-06-08 17:59:32
Re: Converting Bosworth/Toller into Unicode and XML Ondrej Tichy Charles University, Prague 2009-06-13 23:07:30