CS 5761 - Introduction to Natural Language Processing
Project Alpha version due by 4pm Friday April 12. Email a url
where we can find your tar file to patw0006 and tpederse.
Objectives
To produce a preliminary version of your project, where you have
implemented a simple baseline approach by which you can judge your
progress later in the semester.
Specification
You should provide the following in your tar file.
- A README that describes all the files you a are providing, as well
as instructions as to how to use.
- An implementation of your baseline algorithm. The nature of the
baseline will vary from project to project, but it should be a subset of
your full solution.
- An evaluation program that will allow you to score the results of
your baseline. Your program must not only produce output, but you must
have some means of evaluating that output. This most likely means
comparing it to some output where you know the correct answers.
- A driver script written in a shell scripting language that ties all
the pieces together. Consider using csh, bash, or ksh.
Have a friend do a test download of your code to make sure it is
working. You will be surprised how many things you take for granted in
using your own code that are not apparent to someone else.
Anyone who downloads your tar file should be able to unpack the code,
look at the readme, and then run the driver script within just a few
minutes. There should be very few demands placed on the user in order to
figure out how to run your project code. If we can't run your code
after at most 5 minutes of trying you won't get credit for this portion
of the project. Assume that we are running on a csdev machine.
For part of speech tagging projects, assume that I have the penn treebank
available. For authorship id projects assume that I have a few texts from
Project Gutenberg available. These will be of my choosing. For other
projects please provide your data in the tar file, unless that makes your
download rather large. In that case contact me ahead of time and we'll
arrange something.
The alpha version counts for 20% of the total project grade. Late
submissions (after 4pm Friday April 12) will not be downloaded.
Policies (from syllabus)
All programming assignments and your project will be demonstrated during
designated lab sessions. You should also submit an electronic copy of
your source code to the TA prior to the designated demo session. (His
email address is patw0006@d.umn.edu.) There is no other way to submit
your programming assignments or project. Failure to submit AND demo on
time will result in a zero.
Any code you submit should be commented. I must be able to understand
what your code does simply by reading the comments. This understanding
should extend down to the details of your code. So do not simply
describe the input and output, also include comments that describe
your particular algorithm and coding techniques. Failure to comment
to this degree will result in a zero.
All assignments and the project are to be done individually. You are
required to write your own code. Unless otherwise specified, you must
only turn in code that you personally wrote. The only possible exception
to this is if I tell you to use a module that is available in a book
or online archive. However, I will clearly indicate when this is
permissible. Violations of this policy will result in severe grading
penalties and/or failure in the class.