CS 5761 - Introduction to Natural Language Processing
Programming Assignment 2 - Demo in Lab on Monday, Feb 11 at 4pm
(submit code via email to patw0006@d.umn.edu before lab, and bring
written work to the lab)
Objectives
To gain experience with Finite State Automata and Perl regular expressions.
Specification
Design a Finite State Automata that will accept any expression that
refers to a specific date (month, day, and/or year) and another that
will accept any expression that refers to a specific time of day,
regardless of how formally (e.g., the fifth day of March, eleven in the
morning) or concisely (e.g. Mar 5, 11am) it may be expressed. Refer
to problems 2.4 and 2.6 in the Jurafsky and Martin text (page 54) for
additional description.
Use these FSA's as the basis of a Perl program that puts tags around
dates and times in text. This program should use regular expressions
extensively. Remember that any FSA can be equivalently expressed by a
regular expression, so you should develop your FSAs first and then convert
them into regular expressions. The Perl regular expression will take care
of a lot of the work in this assignment, if you allow it to do so!
Time expressions should be marked by [time] and [/time], and
date expressions by [date] and [/date]. For example,
INPUT:
I was born at midnight on March 15. My sister will arrive at 4:00 am on
Monday, June 30. I can't remember if my father was born in April 1943
or 1944. Where were you at 1 o'clock on 11/21/00?
OUTPUT:
I was born at [time] midnight [/time] on [date] March 15 [/date] . My
sister will arrive at [time] 4:00 am [/time] on [date] Monday, June 30
[/date] . I can't remember if my father was born in [date] April 1943
[/date] or [date] 1944 [/date] . Where were you at [time] 1 o'clock
[/time] on [date] 11/21/00 [/date] ?
Your program will be demoed on text from the Wall Street Journal similar
to this. Please turn in your FSA diagrams
in the lab. You may do this in pencil and paper, but it should be legible.
Make sure that all edges are labeled.
Policies (from syllabus)
All programming assignments and your project will be demonstrated during
designated lab sessions. You should also submit an electronic copy of
your source code to the TA prior to the designated demo session. (His
email address is patw0006@d.umn.edu.) There is no other way to submit
your programming assignments or project. Failure to submit AND demo on
time will result in a zero.
Any code you submit should be commented. I must be able to understand
what your code does simply by reading the comments. This understanding
should extend down to the details of your code. So do not simply
describe the input and output, also include comments that describe
your particular algorithm and coding techniques. Failure to comment
to this degree will result in a zero.
All assignments and the project are to be done individually. You are
required to write your own code. Unless otherwise specified, you must
only turn in code that you personally wrote. The only possible exception
to this is if I tell you to use a module that is available in a book
or online archive. However, I will clearly indicate when this is
permissible. Violations of this policy will result in severe grading
penalties and/or failure in the class.