teamname.pl file1.utxwhere file1.utx is a parallel corpora formatted as in assignment 4. (You can allow your program to accept multiple input files, although this is not required). The input and output of your program should be in unicode. You may assume that the input to the program is limited to one pair of languages. In other words, file1.utx is assumed to be made up of a single language pair (english and one other language). However, you should try and make your sentence alignment program language independent. One time you might run it with english-french, the next time with english-spanish, etc.
teamname-eval.pl gold.utx output.utxwhere output.utx is the output created by your alignment program, and gold.utx is the manually aligned (gold standard) version of this same data. teamname-eval.pl should compare these two files and produce a score of some sort that measures the effectiveness of the alignment program. gold.utx should follow the same format output.utx, the only difference is that gold.utx should be created manually, while output.utx is the aligned version of that data as created by your program.
turnin -c cs8995 -p p1a teamname.pl (for sentence alignment program) turnin -c cs8995 -p p1b teamname.utx (gold standard data) turnin -c cs8995 -p p1c teamname.(pdf|ps) (for written report) turnin -c cs8995 -p p1d teamname-eval.pl (evaluation program)This is a team project. Please consult with and work with your team members closely. You may divide the work as you see fit, and you have considerable discretion in your approach to this problem.
by: Ted Pedersen - tpederse@d.umn.edu