CS 8995 Corpus Based Natural Language Processing
Final Project: Empirical Methods for Multilingual Text
Stage 2 - Gold Standard Data, updated as of 4/16
Download the gold standard data from each team and run your sentence
alignment program on it. Your sentence alignment program should be
named teamname.pl (where teamname = morelia, toluca, etc) and your
sentence alignment evaluation program should be named
teamname-eval.pl. Remember to remove the alignment tags from the gold
standard data before you feed it to your sentence aligner. The alignment
tags should only be used by the evaluation program. So, if you are team
TOLUCA and you are running MORELIA gold standard data, the steps you
follow might look like this:
remove-align-tags.pl morelia.utx > morelia-notag.utx
toluca.pl morelia-notag.utx > morelia-align.utx
toluca-eval.pl morelia.utx morelia-align.utx
Send me the alignment results as described here
.
by:
Ted Pedersen
- tpederse@d.umn.edu