Germanic Lexicon Project
Message Board

Home

Texts

Search

Messages

Volunteer

About


[ Main Message Index ]     [ Previous | Next ] [ Reply ]

Author: Sean Crist (Swarthmore College)
Email: kurisuto at unagi dot cis dot upenn dot edu
Date: 2005-05-07 18:48:09
Subject: Re: Doubt about coding style

> What's better in your opinion, to do some HTML to make the
> corrected page as similar to the original OCR'ed page as
> possible, or just typing the relevant text taking care of not
> losing contrast and such?

My opinion is that HTML is better than plain text, but that XML is better than HTML. For example, HTML can capture the structure of tables, but XML can do tables and also capture the semantics of the different uses of boldface in the text (for example). Plain text can't do either.

I wouldn't worry to much about duplicating the exact layout of the original page. The structure is what's important (tables, paragraphs, etc.). The layout falls out from the structure.

By far, the biggest part of the job is correcting the text. So I would say use whatever is most comfortable for you, and then others can convert the text to whatever formats they want. If you give me HTML, I can convert it to XML (it's some work for me, but I'd certainly do it in the case of Wright). You could also mix HTML and non-HTML tags. The important thing is to just be consistent with whatever you do.

There's HTML and then there's HTML. Some programs which produce HTML for you end up including a whole bunch of unnecessary crap. It's easier to work with nice terse HTML which contains only the necessary tags. That's just a preference, however, because the crap doesn't make the file unusable, just somewhat more work.

(BTW- I'm planning to create a web form which allows folks to upload files of any type to this server; so when I get that done, that's a way you could submit the files so that I can post them.)

--Sean

Messages in this threadNameCollege/UniversityDate
Doubt about coding style Raul Vallejo 2005-04-25 16:37:03
Re: Doubt about coding style Sean Crist Swarthmore College 2005-05-07 18:48:09