CS 5761 - Introduction to Natural Language Processing
Project Beta version due by 5pm Thursday, April 29. Submit a tar
file with your system, and a pdf/doc file with your updated
proposal/report via webdrop. Demo in lab at 6pm that same day.
Objectives
To produce a beta version of your project, where you have implemented
the majority of all of your system's functionality, and have updated your
proposal such that it will be suitable as the core of your final report.
System Requirements
You should turn in a tar file that unpacks a directory named with your
user-id, and the course number. For example, in my case this would be
tpederse-5761. Your tar file should include all of your system code and
data necessary for running your system and for evaluation.
Make sure you provide the following in your tar file:
- A README that describes all the files you a are providing, as well
as instructions as to how to use. Make sure that your system is easy to
install, and does not have hard coded path names, etc. that will prevent
it from unpacking and running. If you have used any additional modules
from CPAN or other sources, make sure you provide instructions about how
to download and install those as well.
- Your system should include the majority of it's functionality. How
that is organized and implemented is up to you, but it should be fairly
easy for me or another "untrained" user to install and run. Make sure to
test your system on the csdev platform before you submit, as this is where
we will be running/testing them.
- An evaluation program that will allow you to score the results of
your system. In other words, in addition to having a program that
creates Google Sets or analyzes Voynich text, you should also have a
program (and associated data) that can be used to analyze your system's
output.
- You may want to use driver scripts written in Perl or a shell
scripting language that ties all the pieces together.
A useful hint: Have a friend do a test installation of your code to make
sure it can be easily installed and run. You will be surprised how many
things you take for granted in using your own code that are not apparent
to someone else.
Anyone who uses your tar file should be able to unpack the code,
look at the README, and then run the the system itself within just a
few minutes. There should be very few demands placed on the user in order
to figure out how to run your project code. Part of the grade of your
final system submission will be determined based on whether or not we can
install and run and evaluate your system quickly. Again, make sure to
test on a csdev machine as this is the platform we'll use for testing.
For the Voynich projects, make sure you provide the transcribed version of
the manuscript as a part of your project tar file. For the Google Sets
project, you may assume that I have WordNet already available, so you
don't need to provide that.
Proposal/Report Requirements
This version of your proposal (now morphing into a final report) should
contain all of the changes that I have mentioned in my comments of April 9
to the class, as well as any comments that have been made on either your
initial proposal or the alpha version. You should pay particular attention
to providing full details of your system's approach, as well as a detailed
description of how you will do evaluation.
Finally, remember that you want your final report to be a document that
someone who was not a part of this class could read and understand. So
please make sure you provide sufficient background regarding your
problem, and explain what you have done clearly and without making any
assumptions that the reader will be familiar with either Google Sets or
the Voynich Manuscript.