CS 5761 - Introduction to Natural Language Processing
Project Proposal due by noon Tuesday March 26 via email to tpederse
and patw0006. Please submit plain text, with no attachments.
Objectives
To find a topic for your project, and do some background research on it.
Specification
You will complete an individual project that involves producing both a
Perl implementation and a written report dealing with some interesting
problem in Natural Language Processing. You may select your own topic.
My only requirements are that it be in text processing and have been at
least mentioned (if only in passing) in our textbook.
Please make sure you do not choose an overly broad topic. 'Part of Speech
Tagging using a Trigram model and Good-Turing Smoothing' is an example of
an interesting and well-focused project. 'A Program to Translate English
Text into German' is potentially interesting but will take you several
decades to properly complete. Make your project focused, but on the other
hand it should not be trivial. Implementing a minimum-edit distance
program like we did for one of our programming assignments is focused but
fairly trivial relative to the amount of time we are allowing for project
completion. Please check with me if you aren't sure if your topic is
suitable or not.
By Tuesday March 26 you should have produced a project proposal. It should
include the following:
- Problem Description (1-2 paragraphs) : What is the problem you are
attempting to solve in your project? Describe it in general terms, and
explain why it is important. You should provide at least two references to
papers that discuss the same problem, and you should read these papers in
preparation for writing the proposal. Make sure you properly quote,
footnote, etc. any information you get from these papers (Don't pull an
Ambrose, as some now say...)
- Overview of Solution/Approach (1-2 paragraphs) : What is the general
approach you plan on taking. Will your approach follow one of the
references you mention in the description, or will you invent your
own? (It is ok to re-implement an existing solution, or propose your
own.) The description of your solution is of course tentative in that as
you begin to work on the problem a bit more you may realize that another
approach is better.
- Evaluation Plan (1-2 paragraphs) : How will you show that your
solution does something? Will you need to find or create "gold standard"
data to use as a point of comparison? If so, where will you get that, or
how will you create it. For example, suppose you are working on a
Context Sensitive Spelling Correction program based on n-grams (a good
focused topic). How will you measure how many misspelled words your
program finds, and how will you know that it proposes a proper correction?
This does not need to be a lengthy document, but it should be well written
and carefully thought-out. It will provide a road map for your project so
the more you put into this the more smoothly your project will go. If I
have significant concerns about your topic or some aspect of your proposal
I will let you know by Monday April 1 and possibly request that you choose
a new topic or provide additional details.
Policies (from syllabus)
All programming assignments and your project will be demonstrated during
designated lab sessions. You should also submit an electronic copy of
your source code to the TA prior to the designated demo session. (His
email address is patw0006@d.umn.edu.) There is no other way to submit
your programming assignments or project. Failure to submit AND demo on
time will result in a zero.
Any code you submit should be commented. I must be able to understand
what your code does simply by reading the comments. This understanding
should extend down to the details of your code. So do not simply
describe the input and output, also include comments that describe
your particular algorithm and coding techniques. Failure to comment
to this degree will result in a zero.
All assignments and the project are to be done individually. You are
required to write your own code. Unless otherwise specified, you must
only turn in code that you personally wrote. The only possible exception
to this is if I tell you to use a module that is available in a book
or online archive. However, I will clearly indicate when this is
permissible. Violations of this policy will result in severe grading
penalties and/or failure in the class.