Senseval-3 System Code and Documentation
This page contains links to the code, documentation, and shell scripts
used to create the University of Minnesota, Duluth systems that were used
in the
Senseval-3
word sense disambiguation exercise.
The Duluth systems that participated in the supervised lexical sample
tasks for Senseval-3 were based on the Duluth systems that participated
in Senseval-2. The Duluth system that
participated in the unsupservised lexical sample task was a new system
which was at that time known as SenseRelate (version 0.5) but has since
been superceded by
WordNet::SenseRelate::TargetWord.
In addition, Syntalex
is a system that participated in Senseval-3 that extends the
Duluth Senseval-2 system by incorporating part of speech features and
syntactic features. This was developed by
Saif Mohammed
as a part of his
M.S. thesis.
Duluth Unsupervised Lexical Sample System (Duluth-LSU)
This system is based entirely on the
WordNet::SenseRelate::TargetWord , which uses
WordNet::Similarity
to measure the relatedness between a target word an its neighbors. There
are a few simple driver scripts (Duluth-SR) that will run the algorithm
on all the Senseval-3 words that you can download
here.
Quick Summary: You need to install
WordNet::Similarity, WordNet::SenseRelate::TargetWord and the Duluth-SR
drivers mentioned above. Note that Duluth-SR refers to SenseRelate
version 0.5 - that has since been renamed as
WordNet::SenseRelate::Targetword.
Supervised Lexical Sample Systems (Duluth-xLSS)
There were three main components to these systems: The Ngram Statistics
Package, SenseTools, and Weka. All of these are freely available and can
be linked together via the DuluthShell (v0.3) C-shell scripts available
from
this page. Our objective is to make it possible for you to easily
replicate the Duluth systems, and then go on to develop your own!
The DuluthShell was developed for Senseval-2 and re-used in a
modified form for Senseval-3.
In particular duluth3 and duluth8 were re-used. The Duluth Shell can
be downloaded
here. In addition to the
C-shell scripts, this also includes Senseval data ready for
processing, and instructions telling where to find and how to install
all of the various components. Consult the
README for a description
of
what is available and how to set things up.
The Ngram Statistics Package (v0.69) was used to identify
interesting bigrams and co-occurrences for use as features for the
learning algorithms supported in Weka. BSP is written in Perl and
distributed under the
GNU CopyLeft.
Download it
here.
SenseTools (0.3) was used to format the Senseval text for NSP processing
and also to convert the output of the Ngram Statistics Package into a
form that the machine learning component Weka can process.
Download it
here.
All of the machine learning was carried out with
Weka ,
a suite of Java programs that implement a wide range of machine learning
algorithms. It is freely available from the University of Waikato in New
Zealand. Download it
here.
Quick Summary:
You should download
Duluth-Shell v0.3 and
NSP v0.69 (or better) and
SenseTools-0.3 and
Weka (at
least 3.2.1, but note that you will need the new version of WekaClassify
(a SenseTools component) if you use 3.4 or better).
Start with the
Duluth-Shell README
for an overview of the installation process. NSP and SenseTools have
README files too.
Related Publications
By:
Ted Pedersen
- tpederse AT d umn edu