SenseClusters

SenseClusters is a package of (mostly) Perl programs that allows a user to cluster similar contexts together using unsupervised knowledge-lean methods. These techniques have been applied to word sense discrimination, email categorization, and name discrimination. The supported methods include the native SenseClusters techniques and Latent Semantic Analysis.

You can see a video tutorial entitled "Language Independent Methods of Clustering Similar Contexts" from EACL 2006 that introduces SenseClusters (135 minutes). This is also available on youtube.

We have mailing lists for users and news and developers.

Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods gives a good idea of some of the kinds of problems that can be approached with SenseClusters.

Sample Data

Name Discrimination data that is ready to run. Determine how many people share the same ambiguous name!
Sense Tagged data that is ready to run. Determine the number of meanings for each word! (Use data in Senseval-2 format)

Download the current version (v1.05, released October 3, 2015) from CPAN or Sourceforge

Documentation

Publications

Other Packages Used by SenseClusters

Developed outside of UMD
- Bit-Vector (CPAN Bit Vector module)
- CLUTO (Our Clustering Engine)
- PDL (Fast Matrix Operations in Perl)
- SVDPACKC (Fast Singular Value Decomposition)
- Set-Scalar (CPAN Set-Scalar module)
Developed at UMD
- Algorithm-Munkres (CPAN module that implements Munkres' solution to assignment problem)
- Algorithm-RandomMatrixGeneration (CPAN module to generate random tables for Gap Statistic)
- Math-SparseMatrix (CPAN modules for sparse matrix operations)
- Math-SparseVector (CPAN module for sparse vector operations)
- Text-NSP (Ngram Statistics Package, CPAN module used for lexical feature identification)

SenseClusters Development Team

Ted Pedersen tpederse AT d umn edu
Amruta Purandare amruta AT cs pitt edu
Anagha Kulkarni anaghak AT cs cmu edu
Mahesh Joshi maheshj AT cmu edu

Acknowledgments

The development of SenseClusters has been supported by a National Science Foundation Faculty Early Career Development (CAREER) Program award (#0092784, 2001-2007).