Since wordnet is (even in its outdated version) one of the most useful general-purpose
dictionaries for dictd, this is highly unfortunate.

Digging through the hundreds of lines of string-processing C code of wnfilter to find
out what exactly it is doing wrong with files in the new wordnet format, seemed like
more pain than it would be worth to me;
I instead wrote a replacement of wnfilter in Python.

The format parser and writer are mostly based on the descriptions of the wordnet
format in wndb(7) and the descriptions of the dictd file-format in dictd(8), coupled
with a few experiments to figure out some of the details of the dicdt format dictd(8)
doesn't mention.

wordnet_structures.py self-documents if called with --help, but here's a typical usage
example:

sebastian@raquel$ time ./wordnet_structures.py /usr/share/wordnet/index.adv
/usr/share/wordnet/data.adv /usr/share/wordnet/index.adj  /usr/share/wordnet/data.adj
/usr/share/wordnet/index.noun /usr/share/wordnet/data.noun
/usr/share/wordnet/index.verb /usr/share/wordnet/data.verb
Opening index file '/usr/share/wordnet/index.adv'...
Opening data file '/usr/share/wordnet/data.adv'...
Parsing index file and data file...
Opening index file '/usr/share/wordnet/index.adj'...
Opening data file '/usr/share/wordnet/data.adj'...
Parsing index file and data file...
Opening index file '/usr/share/wordnet/index.noun'...
Opening data file '/usr/share/wordnet/data.noun'...
Parsing index file and data file...
Opening index file '/usr/share/wordnet/index.verb'...
Opening data file '/usr/share/wordnet/data.verb'...
Parsing index file and data file...
All input files parsed. Writing output to index file 'wn.index' and data file
'wn.dict'.
All done.

real    1m37.530s
user    1m14.709s
sys     0m3.684s
sebastian@raquel$ dictzip wn.dict
sebastian@raquel# mv wn* /usr/share/dictd/
sebastian@raquel# /etc/init.d/dictd restart

The created dictd databases work fine for me.

Sebastian Hagen <sebastian_hagen@memespace.net>
Tue, 05 Jun 2007 02:49:04 +0200

