|
Germanic Lexicon Project
Message Board
|
|
|
Author: Ondrej Tichy (Charles University, Prague)
Email: ondrej dot tichy at gmail dot com
Date: 2009-06-13 23:07:30
Subject: Re: Converting Bosworth/Toller into Unicode and XML
Hi Boaz,
sorry for a late reply and thanks for your interest in the GLP. If you are still interested in participating, would you let me know by e-mail (ondrej.tichy@gmail.com)? That would make the communication a bit simpler!
Cheers,
Ondrej
> Hi,
>
> First of all, I'd like to thank everybody here for your efforts in working on this dictionary. I really find it useful.
>
> Right now, I'm trying to generate a dictionary for offline use from the Bosworth/Toller master file (the most recent version, if I judge correctly, is bt_canon_3.txt). The file format is far from being trivial to process (not being proper XML and all that), but I more or less managed to go through that, and I think I can create a script that will almost automatically convert it into proper XML.
>
> As the work on the dictionary seems to be ongoing, the question is - should I really do a fork? Wouldn't it be more productive if the master file itself gets converted into a new well-formed XML format, conveniently sorted by entries (and not by pages)? If this is considered a goal, I'd be glad to help, since I'm already doing that thing anyway.
>
> The second major change I'll have to do is to convert the generated dictionary into Unicode. The master file uses a slew of confusing non-standard SGML/HTML/XML-entities, some of them are redundant and some of them are just plain wrong (lacking a terminating ";", or being misspelled like "&actue;" instead of "´"). I've already made a simple script to fix all that, and normalized everything as proper Unicode, using combining diacritics where necessary. I still haven't gone through logically ordering the Hebrew characters (they're visually ordered right now, although the diacritics are not), but I think that's gonna be simple.
>
> Would any of this work be useful to the project? This can easily solve the problem of not being able to view Greek or Hebrew entries.
>
> Cheers,
> Boaz
Messages in this thread | Name | College/University | Date |
Converting Bosworth/Toller into Unicode and XML |
Boaz Yaniv |
Hebrew University of Jerusalem |
2009-06-08 17:59:32 |
Re: Converting Bosworth/Toller into Unicode and XML |
Ondrej Tichy |
Charles University, Prague |
2009-06-13 23:07:30 |