Supervised Word Sense Disambiguation
Supervised WSD seeks to learn classifiers that can assign senses to words
in text using machine learning techniques. These classifiers are trained
on manually sense
tagged corpora. In general these classifiers are most suitable for
lexical sample or target word disambiguation, where all the occurrences
of a given word in a text is assigned a sense.
We have developed a number of different packages for supervised word sense
disambiguation. Each of these has somewhat different capabilities.
- The WSDShell
is loosely derived from the Duluth-Shell. It has been used (along
with SenseTools) to create a system that participated in the
I2B2 Obesity
Challenge (2008).
-
CuiTools
disambiguates biomedical text relative to senses (CUIs) found in the
UMLS.
-
The Syntalex
system participated in Senseval-3 (2004), and extended the
Duluth-Shell by adding part of speech and some simple syntactic
features to it.
-
WSDGate
integrates NSP
and
Weka into the Gate environment to
perform WSD.
- The
Duluth-Shell
integrates the
Ngram Statistics Package and
Weka, and
SenseTools in
order to create supervised WSD systems based on simple lexical features
like unigrams and bigrams. The Duluth-Shell was used to create systems
that participated in the lexical sample tasks of
Senseval-2 (2001) and
Senseval-3 (2004), and in the
I2B2 Smoking Challenge (2006).
Supervised Word Sense Disambiguation Development Team
Acknowledgments
The development of Supervised Word Sense Disambiguation methods has been
supported by a National Science Foundation
Faculty Early Career Development (CAREER) Program
award (#0092784, 2001-2007).
By: Ted Pedersen -
tpederse AT d umn edu