Senseval-2 System Code and Documentation
[Feb 5, 2002 - The complete Duluth systems that participated in
Senseval-2 are now available. If you download and install BSP (at
least v0.4), SenseTools (at least v0.1), Weka (at least v3.2.1), you can
use the supporting C shell scripts (Duluth-Shell) to replicate the Duluth
systems from Senseval-2. There are complete README files available with
each of these packages. Please contact me with any questions or comments.
tpederse AT d umn edu]
This page contains links to the code, documentation, and shell scripts
used to create the University of Minnesota, Duluth systems that were used
in the
Senseval-2
word sense disambiguation exercise. There were three main components to
these systems: The Bigram Statistics Package, SenseTools, and Weka. All of
these are freely available and can be linked together via the
Duluth-Shell C-shell scripts available from this page. Our objective is
to make it possible for you to easily replicate the Duluth systems, and
then go on to develop your own!
The Duluth systems are a combination of the Bigram Statistics Package,
SenseTools, and the machine learning system Weka. The Duluth-Shell is a
set of C-shell scripts that link all of these components together and
can be downloaded
here. In addition to the
C-shell scripts, this also includes Senseval data ready for
processing, and instructions telling where to find and how to install
all of the various components. Consult README for a description of what
is available and how to set things up.
The Bigram Statistics Package (v0.4) was used to identify
interesting bigrams and co-occurrences for use as features for the
learning algorithms supported in Weka. BSP is written in Perl and
distributed under the
GNU CopyLeft.
Download it
here.
SenseTools (0.1) was used to format the Senseval text for BSP processing
and also to convert the output of the Bigram Statistics Package into a
form that the machine learning component Weka can process.
SenseTools is written in Perl and distributed under the
GNU CopyLeft.
Download it
here
or consult the
README
first.
All of the machine learning was carried out with
Weka ,
a suite of Java programs that implement a wide range of machine learning
algorithms. It is freely available from the University of Waikato in New
Zealand. Download it
here.
You can find brief descriptions of all the
participating systems (including those from Duluth)
here. The Duluth systems were used in the English and Spanish lexical
sample tasks.
Quick Summary
You should download
Duluth-Shell and
BSP v0.4 and
SenseTools-0.1 and
Weka (at
least 3.2.1) .
Start with the
Duluth-Shell README
for an overview of the installation process. BSP and SenseTools have
README files too.
Related Publications
By:
Ted Pedersen
- tpederse AT d umn edu