|
Germanic Lexicon Project
Message Board
|
|
|
Author: Sean Crist (Swarthmore College)
Email: kurisuto at unagi dot cis dot upenn dot edu
Date: 2004-11-16 13:24:42
Subject: Re: A rival Bosworth & Toller project?
Peter,
Whew. Okay.
This throws yet another unexpected loop into things. When I was
regularly checking in on that project back in 2003, it looked like
progress had stopped after a small number of pages. I stopped paying
attention at some point because it looked like the project was dead.
They obviously got it going again at some point.
I agree that the sensible thing would be for the two projects to
combine their efforts. I'd still be willing to do this. However, I
think it's unlikely to happen, because of the history.
Here is the story, or at least my understanding of it.
I had decided as far back as 1999 to put Bosworth/Toller online, and I
was exploring funding and data entry options. In 2001 I received a
small grant to pay a student to scan the Bosworth/Toller. I put the
page images online because I've always thought it best to "release
early, release often" (uncompleted data is more use than no data). In
2002 I got another small grant, and another student did the OCR and
wrote some programs to do the major automated corrections on the text
(the current text which we're correcting is still messy, but it's far,
far cleaner now than the raw output of the OCR).
For some time, my grand plan for the project has been to digitize Torp
first, then Bosworth/Toller, then Cleasby/Vigfusson. By late 2002,
Torp was nearing completion, so I submitted a major grant proposal to
the National Science Foundation in January 2003. This grant would
have paid a fleet of students to work on correcting Bosworth/Toller,
and would have given me one less course to teach each semester so that
I could manage the project (I wouldn't have been any richer; this
would have just freed up some of my teaching time for the project). I
would hear in June 2003 whether I had gotten the grant.
While I was waiting to hear the results of this grant proposal,
something unexpected happened. On 28 March 2003, Bekie Marett wrote
to me and introduced herself as the coordinator of the Online
Anglo-Saxon Dictionary Project, which had begun on 17 March 2003
(eleven days earlier). She was planning to create an online text
version of Bosworth/Toller. She said that she had credited me on her
site for the scanned pages because they had saved her team an immense
amount of work.
Well, this put me in an uncomfortable situation, to say the least. If
it hadn't been for the grant proposal, I might have said, "This is
good news; someone else has taken on Bosworth/Toller, so I can work on
a different text instead." Now, true, this would have meant ditching
the existing online text which a summer's worth of student salary had
already gone into. But my goal was to get as much corrected text
online as quickly as possible, so this arguably would have been the
right move.
But it wasn't possible for me to drop Bosworth/Toller, because of the
pending grant proposal. If Bekie's project had started before I had
submitted the proposal, I would have said, "Good, Bosworth/Toller is
covered; I'll write the grant for another text instead." But at this
point I had a specific proposal for a specific text in the works. A
proposal of this kind is a _major_ deal to prepare, involving work by
many people; the funding cycles are very slow. It simply wasn't
possible for me to go to the funder while they were midstream in the
lengthy evaluation process and say, "Oops, I decided to do another
text instead."
I wrote back to Bekie and explained the whole situation. I asked if
we could discuss this together and figure out the right thing to do.
I made the suggestion that she and her team consider working on one of
the other texts whose scanned images I had posted.
Bekie got angry at me for suggesting this. Now, I can understand
this. When you've gotten yourself organized and enthusiastic around a
project, it's hard to suddenly drop it and switch gears. I explained
why I could not drop the project because of the grant. She said, "If
there ends up with more than one version online, this is hardly a bad
thing." I thought it would be much better to have one version each of
two different texts, and said so. Hand corrections are a huge amount
of work, and there's no point in doing the same text twice.
When it became clear that Bekie was not going to switch texts, I asked
whether we could somehow combine our efforts, and offered to talk with
the programmer on her team to work out standards for character
encoding and markup. She never responded to that suggestion.
Around this time, Bekie's site un-credited me for the scanning work
and credited Ian Marett instead. Bekie had already told be she was
using the page images from my site, and I doubt very much that they
re-scanned the entire 1,302 pages of text. It appears that Bekie's
project builds on my students' work but does not give proper credit.
Ian probably scanned the 14-page introduction, which I hadn't posted
yet at the time. If so, he deserves credit for this. However, the
credits make it look as if Ian Marett were responsible for scanning
the whole book.
I can only close my eyes and shake my head at this. This is so
ridiculous; we're both trying to give something away for free.
In fairness, there is one thing I did wrong myself, and which I
regret. At one point in our discussion, I asked whether Bekie would
consider taking her site offline until June, when the result of the
grant proposal would be announced. I was worried that one of the
evaluators might find Bekie's site and would deny my proposal on the
grounds that the project was already being done by someone else.
Bekie correctly pointed out that this was not honest of me. When I
thought about it, I realized she was right, and said so. I shouldn't
have suggested that, and I apologized to her.
After April 2003, Bekie and I had no further communication. In June
2003, I learned that I hadn't gotten the NSF grant. I looked at
Bekie's site from time to time. As I watched the completed number of
pages, it looked to me as if they had gotten off to a good start, but
then had lost interest in the project. When the number of completed
pages seemed not to change for a long time, I stopped regularly
checking their site.
I still wanted an online Bosworth/Toller, and I had no funding, so it
looked like the only way this was going to happen was going to be if I
set up a volunteer-based project on my own time. So after I finished
Torp, I went ahead and implemented the web-based correction system
that I had had in mind for some time. That is the system which is in
place now, and it's going well so far.
Now comes unexpected news from Bekie's camp again. After I thought
they had dropped off the radar, it turns out that they have 500-some
pages corrected. When I was still checking their site last year, it
really looked like the project was going nowhere. They must have
really stepped up their efforts in the last few months.
So, now what the hell do we do? I see a few choices:
1) Try approaching Bekie again, and ask for a second time whether she
would consider combining our efforts. Unlikely to be of any use, given
past experience.
2) Press on with our own corrections, even tho this means doing pages
which Bekie's team has already done. This involves some unnecesary
duplication of effort.
3) We could incorporate the pages Bekie's team has already corrected
into our own version of the dictionary, give due credit, and then
continue with our own corrections. There is no legal problem with
this, even if we don't have permission, because the text is out of
copyright. However, this idea gives me an icky feeling which I have
not fully sorted out.
I'd definitely appreciate input on how to handle this.
I'm concerned that the whole rotten issue is going to dampen the
enthusiasm of volunteers for both projects (unless it has the positive
effect of stoking a competition that gets the project done faster; I
have no idea how others will react to the situation I've just
described). I wish this issue would just go away. I just wanted to
create something useful that we could all share. I didn't ever mean
to get into this kind of politics.
Bekie had told me that her team is only doing the main 1302-page
volume and is not planning to digitize the 768-page supplement. So if
someone wants to be sure that their work isn't redundant, they could
correct pages from the supplement. Nobody else is working on it.
So I don't know what the hell to do. I'm not sure what I should have
done differently to avoid this mess. I'm very much open to input on
how to proceed from here.
--Sean