CS 8995 - Corpus Based Natural Language Processing - Spring 2001
Class Information:
Instructor Office Hours
Syllabus
Required Readings (week by week)
Program Samples (example code from lecture)
Final Project Teams (stage 1) (curious about the
team names ?)
Final Project Teams (stage 2)
Final Project:
Stage 1 Sentence Alignment, Gold
Standard Data due Friday March 23, 4 pm, the rest is due Monday March 26,
4pm. As of Mar 20 the evaluation requirements
have been expanded! Please make sure to include the new
information!
GOLD STANDARD DATA posted (3/23)
EVALUATION PROGRAMS posted (3/26)
EVALUATION RESULTS posted
(3/27)
Stage 2 More Sentence Alignment, due
Monday April 16
Consider using some of the bitext data now available from Assignment 4 for
testing purposes. See below.
GOLD STANDARD DATA updated (4/16)
EVALUATION PROGRAMS posted (4/16)
Stage 3 Building a Translation
Dictionary with the EM algorithm (Optional extra credit) due Thursday May
10
Programming Assignments:
All programming assignments should be turned in using
turnin on machine hh33812.
Unless specified otherwise, assignments are to be completed individually.
Here is a reminder about that policy.
Assignment 1 Mutual Information, due Wed
Jan 31, 4 pm
Solution Key for this
text with N=10.
Assignment 2 pointwise Mutual Information,
due Mon Feb 12, 4 pm
An analogy of sorts that may provide
a little guidance. A few more thoughts .
Preliminary info about the write-up .
Even more info about the write-up .
Assignment 3 N-gram models, due Mon Feb 26,
4 pm
Solution Key using Witten Bell
Smoothing and
text1 and
text2 .
Assignment 4 parallel corpus collection,
due Wed Mar 07, 4 pm. A reminder
about our objectives.
Here's the bitext that you created!
(posted 4/3/01). A note on how to
use it for stage 2.
Further details on assignment 4
grading, as well as EXTRA CREDIT OPPORTUNITY.
Perl Resources:
Sources of Text:
Other Resources:
Schedule
Lecture meets MW 4-5:40 pm in HH 302.
By:
Ted Pedersen
- tpederse@d.umn.edu
Last update: 1/21/2000