Supervised Word Sense Disambiguation

Supervised WSD seeks to learn classifiers that can assign senses to words in text using machine learning techniques. These classifiers are trained on manually sense tagged corpora. In general these classifiers are most suitable for lexical sample or target word disambiguation, where all the occurrences of a given word in a text is assigned a sense.

We have developed a number of different packages for supervised word sense disambiguation. Each of these has somewhat different capabilities.

The WSDShell is loosely derived from the Duluth-Shell. It has been used (along with SenseTools) to create a system that participated in the I2B2 Obesity Challenge (2008).
CuiTools disambiguates biomedical text relative to senses (CUIs) found in the UMLS.
The Syntalex system participated in Senseval-3 (2004), and extended the Duluth-Shell by adding part of speech and some simple syntactic features to it.
WSDGate integrates NSP and Weka into the Gate environment to perform WSD.
The Duluth-Shell integrates the Ngram Statistics Package and Weka, and SenseTools in order to create supervised WSD systems based on simple lexical features like unigrams and bigrams. The Duluth-Shell was used to create systems that participated in the lexical sample tasks of Senseval-2 (2001) and Senseval-3 (2004), and in the I2B2 Smoking Challenge (2006).

Publications

Supervised Word Sense Disambiguation Development Team

Ted Pedersen tpederse AT d umn edu
Saif Mohammad saif AT umd edu
Mahesh Joshi maheshj AT cmu edu
Bridget McInnes bthomson AT umn.edu

Acknowledgments

The development of Supervised Word Sense Disambiguation methods has been supported by a National Science Foundation Faculty Early Career Development (CAREER) Program award (#0092784, 2001-2007).

By: Ted Pedersen - tpederse AT d umn edu