SenseClusters Publications

These publications describe the development and use of the SenseClusters package. Papers prior to 2003 trace the origins of the methodology.

2019

Approaching Terminological Ambiguity in Cross-Disciplinary Communication as a Word Sense Induction Task. A Pilot Study (Mennes, Pedersen, and Lefever) Language Resources and Evaluation, 53, 889-917, Springer.

2015

Duluth : Word Sense Discrimination in the Service of Lexicography (Pedersen) - Appears in the Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), June 2015, pp. 282-286, Denver, CO.

2013

Duluth: Word Sense Induction Applied to Web Page Clustering (Pedersen) - Appears in the Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), in conjunction with the Second Joint Conference on Lexical and Computational Semantics (*SEM-2013), June 13-15, 2013, pp. 202-206, Atlanta, Georgia.

2010

The Effect of Different Context Representations on Word Sense Discrimination in Biomedical Texts (Pedersen) - Appears in the Proceedings of the 1st ACM International Health Informatics Symposium, November 11 - 12, 2010, pp. 56 - 65, Arlington, VA. [acceptance rate 17%]
Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods (Pedersen), University of Minnesota Supercomputing Institute Research Report UMSI 2010/118, October 2010. (Also available from CMP-LG E-Print Archive as 0806.3787)
Duluth-WSI: SenseClusters Applied to the Sense Induction Task of SemEval-2 (Pedersen) - Appears in the Proceedings of the SemEval 2010 Workshop : the 5th International Workshop on Semantic Evaluations, July 15-16, 2010, pp. 363-366, Uppsala, Sweden

2009

Improved Unsupervised Name Discrimination with Very Wide Bigrams and Automatic Cluster Stopping (Pedersen) - Appears in the Proceedings of the Tenth International Conference on Intelligent Text Processing and Computational Linguistics, March 1-7, 2009, pp. 294-305, Mexico City. [acceptance rate 26%]

2008

Name Discrimination and E-mail Clustering Using Unsupervised Clustering of Similar Concepts (Kulkarni and Pedersen), Journal of Intelligent Systems (Special Issue : Recent Advances in Knowledge-Based Systems and Their Applications), 17(1-3), 37-50, 2008.

2007

UMND2 : SenseClusters Applied to the Sense Induction Task of Senseval-4 (Pedersen) - Appears in the Proceedings of SemEval-2007: 4th International Workshop on Semantic Evaluations, June 23-24, 2007, pp. 394-397, Prague, Czech Republic.
Unsupervised Discrimination of Person Names in Web Contexts (Pedersen and Kulkarni) - Appears in the Proceedings of the Eighth International Conference on Intelligent Text Processing and Computational Linguistics, pp. 299-310, February 18-24, 2007, Mexico City. [acceptance rate 29%] Download the data used in this paper (Kulkarni name corpus).
Discovering Identities in Web Contexts with Unsupervised Clustering (Pedersen and Kulkarni) - Appears in the Proceedings of the IJCAI-2007 Workshop on Analytics for Noisy Unstructured Text Data, pp. 23-30, January 8, 2007, Hyderabad, India. Download the data used in this paper (Kulkarni name corpus).

2006

Determining Smoker Status using Supervised and Unsupervised Learning with Lexical Features (Pedersen) - Appears in the Working Notes of the i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data, Nov 10-11, 2006, Washington, DC.
Unsupervised Context Discrimination and Automatic Cluster Stopping (Kulkarni and Pedersen), University of Minnesota Supercomputing Institute Research Report UMSI 2006/90, August 2006. [Note: This is Anagha's MS thesis, from July 2006.]
How many different "John Smiths", and who are they? (Kulkarni and Pedersen) - Appears in the Proceedings of the Twenty-First National Conference on Artificial Intelligence, pp. 1885-1886, July 19, 2006, Boston, MA. (Student Poster)
Unsupervised Corpus Based Methods for WSD (Pedersen), In Agirre, E. and Edmonds, P. (Editors), Word Sense Disambiguation : Algorithms and Applications, June 2006, pp. 133-166, Springer.
Automatic Cluster Stopping with Criterion Functions and the Gap Statistic (Pedersen and Kulkarni), Appears in the Proceedings of the Demonstration Session of the Human Language Technology Conference and the Sixth Annual Meeting of the North American Chapter of the Association for Computational Linguistics, pp. 276-279, June 6, 2006, New York City.
Selecting the "Right" Number of Senses Based on Clustering Criterion Functions (Pedersen and Kulkarni), Appears in the Proceedings of the Posters and Demo Program of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics, pp. 111-114, April 5-7, 2006, Trento, Italy. [acceptance rate 40%]
Improving Name Discrimination : A Language Salad Approach (Pedersen, Kulkarni, Angheluta, Kozareva, and Solorio) - Appears in the Proceedings of the EACL 2006 Workshop on Cross-Language Knowledge Induction, pp. 25-32, April 3, 2006, Trento, Italy. Download the Bulgarian, English, Spanish, and Romanian data used in this paper!
An Unsupervised Language Independent Method of Name Discrimination Using Second Order Co-occurrence Features (Pedersen, Kulkarni, Angheluta, Kozareva, and Solorio) - Appears in the Proceedings of the Seventh International Conference on Intelligent Text Processing and Computational Linguistics, pp. 208-222, February 19-25, 2006, Mexico City. [acceptance rate 30%] Download the Bulgarian, English, Spanish, and Romanian data and stoplists used in this paper.

2005

Name Discrimination and Email Clustering using Unsupervised Clustering and Labeling of Similar Contexts (Kulkarni and Pedersen) - Appears in the Proceedings of the Second Indian International Conference on Artificial Intelligence, pp. 703-722, December 20-22, 2005, Pune, India. [acceptance rate 35%] Download the data used in this paper.
Identifying Similar Words and Contexts in Natural Language with SenseClusters (Pedersen and Kulkarni) - Appears in the Proceedings of the Twentieth National Conference on Artificial Intelligence, pp. 1694-1695, July 12, 2005, Pittsburgh, PA. (Intelligent Systems Demonstration)
Download the data used in this demo.
Unsupervised Discrimination and Labeling of Ambiguous Names (Kulkarni) - Appears in the Proceedings of the Student Research Workshop of the 43rd Annual Meeting of the Association for Computational Linguistics. pp. 145-150, June 27, 2005, Ann Arbor, MI. [acceptance rate 28%] Download the data used in this paper.
SenseClusters: Unsupervised Clustering and Labeling of Similar Contexts (Kulkarni and Pedersen) - Appears in the Proceedings of the Demonstration and Interactive Poster Session of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 105-108, June 26, 2005, Ann Arbor, MI. [acceptance rate 55%] Download the data used in this paper.
Resolving Ambiguities in Biomedical Text with Unsupervised Clustering Approaches (Savova, Pedersen, Purandare and Kulkarni) - University of Minnesota Supercomputing Institute Research Report UMSI 2005/80 and CB Number 2005/21, May.
Name Discrimination by Clustering Similar Contexts (Pedersen, Purandare, and Kulkarni) - Appears in the Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics, pp. 220-231, February 13-19, 2005, Mexico City. [acceptance rate 37%] Download the data used in this paper.

2004

Improving Word Sense Discrimination with Gloss Augmented Feature Vectors (Purandare and Pedersen) - Appears in the Proceedings of the Workshop on Lexical Resources for the Web and Word Sense Disambiguation, pp. 123-130, November 22, 2004, Puebla Mexico.
Word Sense Discrimination by Clustering Similar Contexts (Purandare and Pedersen), University of Minnesota Supercomputing Institute Research Report UMSI 2004/146, September 2004. [Note: This is Amruta's MS thesis, from August 2004.]
Discriminating Among Word Meanings by Identifying Similar Contexts (Purandare and Pedersen) - Appears in the Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), pp. 964-965, July 25-29, 2004, San Jose, CA (Student Abstract) [ppt]
SenseClusters - Finding Clusters that Represent Word Senses (Purandare and Pedersen) - Appears in the Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), pp. 1030-1031, July 25-29, 2004, San Jose, CA (Intelligent Systems Demonstration)
Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces (Purandare and Pedersen) - Appears in the Proceedings of the Conference on Computational Natural Language Learning (CoNLL), pp. 41-48, May 6-7, 2004, Boston, MA. [acceptance rate 48%]
SenseClusters - Finding Clusters that Represent Word Senses (Purandare and Pedersen) - Appears in the Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-04), pp. 26-29, May 3-5, 2004, Boston, MA. (Demonstration System)

2003

Discriminating Among Word Senses Using Mcquitty's Similarity Analysis (Purandare) - Appears in the Proceedings of the Student Research Workshop at HLT-NAACL, pp. 19-24, May 30-31, 2003, Edmonton, Canada. [ppt]

1998

Knowledge Lean Word Sense Disambiguation (Pedersen & Bruce) - Appears in the Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98), p. 800-805, July 28-30, 1998, Madison, WI [acceptance rate 30%]

Raw Corpus Word Sense Disambiguation (Pedersen) - Appears in the Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98), p. 1198, July 28-30, 1998, Madison, WI (Student Poster)

Learning Probabilistic Models of Word Sense Disambiguation (Pedersen) May 1998, Southern Methodist University, 195 pages (PhD Dissertation) (Also available from CMP-LG E-Print Archive as 0707.3972)

1997

Distinguishing Word Senses in Untagged Text (Pedersen & Bruce) - Appears in the Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (EMNLP-2), pp. 197-207, August 1-2, 1997, Providence, RI. [acceptance rate 35%] (Also available from CMP-LG E-Print Archive as #9706008 )

Knowledge Lean Word Sense Disambiguation (Pedersen) - Appears in the Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97), p. 814, July 27-31, 1997, Providence, RI (Doctoral Consortium)

By: Ted Pedersen - tpederse AT d umn edu