SenseTools
SenseTools is a package of Perl programs (and one Java program) that
converts Senseval-2 formatted
sense-tagged text into the
arff
format that is required for input to
Weka,
which is a suite of Java programs that implement a wide range of
machine learning algorithms.
As a result, SenseTools allows you to carry out supervised word sense
disambiguation experiments using any learning algorithm found in Weka
(which includes decision trees, neural networks, Naive Bayesian
classifiers, support vector machines, rule based learners, etc. etc.)
SenseTools converts sense-tagged text into a plain text form that
can be used by the
Ngram Statistics Package for identification of lexical features such
as unigrams, bigrams, and collocations. It also provides programs that
will extract features identified by NSP (or manually designated by the
user) from the Senseval-2 formatted sense-tagged text. Ultimately
SenseTools will represent the sense-tagged text in the arff format which
can then be used as input to Weka.
Once Weka has learned a model, SenseTools provides a java program
(WekaClassify) that classifies a set of test/evaluation data in arff
format using a previously learned model. It produces as output the
distribution of "scores" that shows the individual probability or
confidence associated with each possible answer/classification for each
instance in the test data.
SenseTools also provides a number of simple methods for creating ensembles
of classifiers based on WekaClassify output, and for scoring the results
of sense--tagging against a manually provided gold standard using
precision and recall.
Current version (Use with Weka 3-4)
Previous versions (Use with Weka 3-2)
Related Tools
- WSD-Shell is the successor to the
DuluthShell. It acts as a driver that runs a large number of WSD
experiments using SenseTools and Weka.
- DuluthShell-v0.3
(released 05/10/03). See the
README. (Must be run with SenseTools-0.3)
- SenseClusters uses preprocess.pl,
nsp2regex.pl, and the logic of xml2arff!
WekaClassify
WekaClassify may be of general interest to Weka users, so we provide it
in a separate distribution.
- If you are using Weka 3-4, then you must use
WekaClassify (version 0.4)
which also includes a README.
- If you are using Weka 3-2, then you must use
WekaClassify (version 0.3)
which also includes a README
and a number of test cases.
By:
Ted Pedersen
- tpederse AT d umn edu