CS 8995 Corpus Based Natural Language Processing
Final Project: Empirical Methods for Multilingual Text
Stage 1 - Gold Standard Data as submitted by each team on 3/23
Download the gold standard data from each team and run your sentence
alignment program on it. Your sentence alignment program should be
named teamname.pl (where teamname = morelia, toluca, etc) and your
sentence alignment evaluation program should be named
teamname-eval.pl. Remember to remove the alignment tags from the gold
standard data before you feed it to your sentence aligner. The alignment
tags should only be used by the evaluation program. So, if you are team
TOLUCA and you are running MORELIA gold standard data, the steps you
follow might look like this:
remove-align-tags.pl morelia.utx > morelia-notag.utx
toluca.pl morelia-notag.utx > morelia-align.utx
toluca-eval.pl morelia.utx morelia-align.utx
Please send me a summary of your results via email to tpederse.
Just the table described in this evaluation
note will be fine. Make sure to identify for which team you
are submitting results. I plan to present/discuss these in class
on Monday 3/26 so it would help to have them as far ahead of
class time as is possible.
by:
Ted Pedersen
- tpederse@d.umn.edu