SenseClusters
SenseClusters is a package of (mostly) Perl programs that allows a user to
cluster similar contexts together using unsupervised knowledge-lean
methods. These techniques have been applied to word sense discrimination,
email categorization, and name discrimination. The supported
methods include the native SenseClusters techniques and Latent
Semantic Analysis.
You can see a
video tutorial entitled "Language Independent Methods of Clustering
Similar Contexts" from EACL 2006 that introduces SenseClusters
(135 minutes). This is also available on youtube.
We have mailing lists for
users and
news and
developers.
Computational Approaches to Measuring the Similarity of Short
Contexts : A Review of Applications and Methods
gives a good idea of some of the kinds of problems that can be
approached with SenseClusters.
Sample Data
- Name Discrimination
data that is ready to run. Determine how many people share the same
ambiguous name!
- Sense Tagged
data that is ready to run. Determine the number of meanings for
each word! (Use data in Senseval-2 format)
Download the current version (v1.05, released October 3, 2015) from
CPAN
or
Sourceforge
Other Packages Used by SenseClusters
- Developed outside of UMD
- Bit-Vector (CPAN Bit Vector module)
- CLUTO (Our Clustering Engine)
- PDL (Fast Matrix Operations in Perl)
- SVDPACKC (Fast Singular Value Decomposition)
- Set-Scalar (CPAN Set-Scalar module)
- Developed at UMD
SenseClusters Development Team
Acknowledgments
The development of SenseClusters has been supported by a National Science
Foundation
Faculty Early Career Development (CAREER) Program
award (#0092784, 2001-2007).