CS 8761 Natural Language Processing - Fall 2002 - MRD + Web => Corpus
Final Project - Stage I - Beta version due, Mon Nov 18, noon
This may be revised in response to your questions. Last update Mon
Nov 4 7:00 pm
Objectives
Develop Perl modules that provide an interface to LDOCE (Longman's Dictionary of
Contemporary Language) and the Macquarie Dictionary (aka Big Mac). Your team should develop a separate and independent interface to each dictionary.
Specification
Design and Implement Perl modules that provide interfaces to LDOCE and Big Mac. The raw data
and documentation for these dictionaries is available at /home/cs/tpederse/CS8761/LDOCE3 and
/home/cs/tpederse/CS8761/BigMac.
These modules should be separate and independent. You must design these so that they are
suitable for distribution via CPAN . This includes
providing documentation using perldoc and following the standard method of installing
modules. I will expect to install and use your code as a Perl module just like any other Perl
module I might get from CPAN.
Your modules should be to LDOCE and Big Mac as
WordNet::QueryData
is to WordNet. In other words, provide a Perl interface to the dictionary that returns the
useful information present in the dictionary so that it can be used in a Perl program. Please
consider the interface as your only way to get information from the dictionary.
QueryData is a Perl module and we have it (along with WordNet) installed on our system.
Please experiment with a bit to see what sort of capabilities it has, and use it to get some
ideas of the kinds of capabilities you might want to support. Please remember of course that
WordNet, LDOCE and Big Mac are all quite distinct and contain different kinds of
information that is stored in different ways.
When your team thinks about the type of capabilities that you will need to support in your
interface, assume that your interface is the only way to access the data in the dictionary.
Make sure you don't overlook any significant sources of information provided in the
dictionary. This might require that you spend a little time looking at the structure of the
dictionary and determine what is provided and what is not.
Documentation
Provide sufficient documentation (to be read via perldoc) such that I can install your
module and start to use it without much difficulty. Imagine that I have limited experience
with LDOCE and Big Mac and that I am not able to view the source sgml files. Your interface
is my only way to see and access that data. Your documentation is my only source of
information about your code and its capabilities.
Submission Guidelines
Submit your modules to the web drop in two distinct tar files named for your team. Once
unpacked things should be structured such that I can install your module using the standard
"3 step CPAN" install. The three steps are as follows.
perl Makefile.PL
make
make test
I should not have to do anything else to get each of your modules installed. I should be able
to include it in a program via the use command. Please provide some example usages in your
documentation, and of course your test files will show how to use the code as well (see
QueryData again as an example).
Make sure you team name, individual names, date, etc. are included in your source code.
Your code may well end being distributed via CPAN so provide appropriate info about
copyrights, distribution, etc.
Submit your LDOCE and Big Mac interfaces separately. Only submit 2 per team. Coordinate with
your teammates so you don't have multiple submissions.
This is a team assignment. You are strongly advised to divide up the
work of the project into tasks that can be carried out in parallel by
various team members. All team members should be acknowledged in the
comments, etc. and all teammates will receive the same grade. Do not work
with other teams. Each team should operate independent of all the
other teams. Make your own decisions as a team and do not be
influenced by the decisions of other teams if you happen to hear of
them accidentally. You are free to work with your teammates as closely
as is necessary.
by:
Ted Pedersen
- tpederse@umn.edu