Germanic Lexicon Project
Message Board

Home

Texts

Search

Messages

Volunteer

About


[ Main Message Index ]     [ Previous | Next ] [ Reply ]

Author: Sean Crist (Nuance Communications)
Email: kurisuto1 at yahoo dot com
Date: 2007-09-01 22:24:35
Subject: Re: File Formats

> I'm doing a little project involving quite a number of words, so I
> decided to download all of cleasby-vigfusson, and run my scripts against it,
> rather than hammering your search engine. It's obviously formatted to be
> computer readable, but I haven't found a description of the formatting
> conventions, and have been pretty much working out the details by trial and
> error. (E.g. a blank line indicates either the start of an entry or a page
> number or a page header, except when it's within the
> <Introduction> section.)

Yes, those descriptions of the file format are correct.

The long-term plan is to express the structure of the dictionary using XML, as is the case with the Torp dictionary of Proto-Germanic (which is much further along in the pipeline than Cleasby/Vigfusson). For now, the file format is something very nearly approaching plain text; I want to keep things simple and obvious for everyone who is working on corrections, so I haven't added tags to show the start and end of entries, and so on. That comes after the hand corrections are done.

If you click on the "About" tab above, there are various pages about character encoding standards and other aspects of file formats. They might be of some help, although I don't think I ever actually wrote down how entries are divided in the in-progress version of Cleasby/Vigfusson.

--Sean



Messages in this threadNameCollege/UniversityDate
File Formats Arlie Stephens 2007-09-01 18:26:36
Re: File Formats Sean Crist Nuance Communications 2007-09-01 22:24:35