CS 8761 Natural Language Processing - Fall 2002 - MRD + Web => Corpus
Final Project - Stage III - Sentiment Classification
This may be revised in response to your questions. Last update Wed
Dec 11 10:00 am
PLEASE NOTE: Do not collaborate outside your team.
EACH TEAM IS EXPECTED TO WORK INDEPENDENTLY. DO NOT DISCUSS YOUR PROJECT
WITH ANYONE OUTSIDE OF YOUR TEAM. DO NOT SHARE RESOURCES OUTSIDE OF YOUR
TEAM. THIS INCLUDES CODE, PAPERS, IDEAS, EMAIL, WEB SITES, ETC.
IF TWO TEAMS ARE FOUND TO HAVE DELIBERATELY SHARED IDEAS, RESOURCES, ETC.
THEN ALL TEAM MEMBERS WILL RECEIVE A ZERO ON THIS PROJECT.
Objectives
Invent an algorithm to perform sentiment classification. Implement and
evaluate it. Your algorithm must utilize a combination of corpus based
information from the World Wide Web and a machine readable dictionary.
Specification
You must develop a solution to the sentiment classification problem with
your teammates. In particular we want to automatically classify reviews as
being positive or negative simply by referring to their content and
without referring to the "summary judgment" of the reviewer.
Your team must develop one approach that employs the World Wide Web,
LDOCE, AND Big Mac. All of these resources must be present in
your solution. You may only access these resources through your Stage
I and Stage II interfaces, which you are free to improve during the
remainder of this project. All the data used by your algorithm must
come through these interfaces. You should not encode any of your
intuitions in your algorithm. Your program should be completely "data
driven", where that data comes from the WWW and your MRDS.
As sources of inspiration, I have provided you with three papers: one by
Niwa and Nitta (COLING-1994), one by Pang, et. al. (EMNLP-2002) and one
by Turney (ACL-2002). These are given simply provided to give you ideas of
what other people have done with this and similar problems, and provide
some additional background on the problem. You should not simply attempt
to re-implement one of these techniques. I would like your team to develop
a new approach that combines a machine readable dictionary with corpus
information from the World Wide Web! You are welcome to read up on the
literature of "sentiment classification". This is an increasingly popular
area of research so you can find related work other than the above.
The data employed by Pang, et. al. (EMNLP-2002) is available here.
I have also put that data in our CS8761 directory as movie-data. We will
use that as part of our evaluation.
In addition, each team must collect some review data from the Internet.
Each team should locate at least 100 reviews, approximately 50 negative
and 50 positive for something other than movies! Your reviews should be
from a single domain (restaurants, cars, etc...). Each review should have
at least 200 word tokens, and be downloaded and stored in its own file
within a directory structure identical to the movie-data. This just
means that positive and negative examples should be in different
directories (pos and neg).
Please note that you should collect this data manually, and not attempt
to automatically locate this data. Your Stage II tool is not appropriate
for this task. Also, make sure you find reviews where there is a clearly
identifiable positive or negative "summary" from the reviewer. Your
algorithm must *not* use this summary judgment as a part of it's decision
process. We just want it for evaluation purposes.
The team collected review data is now available here.
You must use your modules from Stage I and II as your only interfaces to
the the World Wide Web, LDOCE, and Big Mac. These interfaces should be
your only sources of data. Do not draw upon data from other sources. In
particular, do not take the reviews you are going to classify as a source
of data. In other words, we want to avoid a supervised learning technique
such as we employed with word sense disambiguation.
Your team will also need to submit final versions of your Stage I and
Stage II submissions. I expect that they will be "CPAN ready". Please
remember that any data your solution requires must come through these
interfaces!
Your sentiment classification program should use your stage I and
stage II modules, and should run as follows:
sentiment.pl DIRECTORY
where DIRECTORY is a directory of reviews where positive and negative
reviews are divided into positive (pos) and negative (neg) subdirectories.
(Like the movie-data directory is structured). Each review is in a
separate file. Your program should write to standard output. In general
it should show the file name of each review, it's known classification,
and then the judgment of your system. Your system should have THREE
judgments: positive, negative, or can't decide. You should then compute
precision and recall for your method based on the formulation of these
scores in our textbook.
The following is an example of what your output should look like:
../test-data/neg/review1.txt negative negative
../test-data/neg/review2.txt negative positive
../test-data/pos/review2.txt positive positive
../test-data/pos/review2.txt positive undecided
2 1 1
Precision => 0.66
Recall => 0.50
F-measure => 0.57
The values 2 1 1 indicate that 2 reviews were classified correctly, 1 was
classified incorrectly, and that 1 was not classified (undecided).
You should also produce a trace file (trace.txt) that "explains" why your
system reached the judgment it did for a particular review. The
explanation can just be whatever values or facts that your algorithm used
for reaching a particular conclusion. Make sure that your trace clearly
indicates which file is being processed, and attempt to show the evidence
that is being gathered to make a decision about that review.
Deadlines!
- Friday, December 6, 5pm. Written description of your algorithm AND
a beta implementation.
- Monday, December 9, 5pm. 100 reviews from each team, formatted like
movie-data. Clean up the data as much as is feasible, but it need not be
perfect.
- Friday, December 13, 5pm. Final versions of your Stage I and Stage
II modules. These should be CPAN ready, and each module should include 20
test cases.
- Monday, December 16, 4-6pm. Make formal presentation to class during
final exam period where you describe the algorithm, show a worked
example, discuss your experimental results, and explain why your method
works (or doesn't) and what you might have done differently to improve
the results. Please prepare a powerpoint presentation that can be run on
the computer or shown via overheads. Each team will have 20 minutes. You
may designate one person to give the presentation, of you may share that
responsibility amongst you in some way. That's up to you.
- Tuesday, December 17, noon. Final version of your project.
Documentation
There are three pieces of documentation that are required for your
sentiment program.
- Make sure that sentiment.pl is commented to the extent that I can
clearly see when you are accessing different sources of information
(when is it using LDOCE, BigMac and WebReader) and that I can follow the
flow of the algorithm somewhat as well.
- Provide an updated version of algorithm.txt that explains your
final algorithm in detail.
- Provide a detailed report in a file called analysis.txt, which
should describe several experiments with sentiment.pl. You should conduct
experiments with at least four sources of data:
- review data your team collected
- review data from one other team
- review data from Pang et. al. (movie-data, at least 50 from pos and
50 from neg)
- Create a sanity-data review file where the "reviews" are not actually
reviews but are rather something else completely. You might consider
chopping up text from Project Gutenberg to create such data. Please turn
this data in with your final project submission. You should have at least
50 positive and negative "reviews" in this data.
Report your results for each set of data, and discuss what these results
lead you to conclude about the problem and your particular approach. How
could you possibly improve this approach? You have considerable latitude
in what you discuss, but please make an effort to actually analyze why
your method works or does not. Also discuss the amount of time it took
your method to carry out these experiments, and how that might be improved.
If possible, you should "decompose" your methods such that you can
report results using just one kind of information. For example, how well
does your method fare using just web data? Or just LDOCE? Or just BigMac?
If your method totally integrates these three sources of information then
of course you can't decompose. But if you have three fairly separate
techniques that can be separated in this way, I will expect that you
carry out this kind of analysis and report results for the different
pieces as well as the whole approach.
Submission Guidelines
When you submit your final project on Tue Dec 17, submit all stages of the
project. Thus, you may still make small changes to your Stage I and Stage
II modules, but in general my expectation is that these will be "fine
tuning" issues and not major changes in functionality over the version
you submitted on Fri Dec 13.
Please put your stage I and stage II modules into directories called LDOCE,
BigMac, and WebReader. Please use exactly those names. Please put those
within a main directory named after your team (please use Alianza, Melgar,
SportBoys, Cristal, and Minas as those names). This main directory should
also include your sentiment.pl program plus your various reports. It
should also include a directory called sanity-data that includes the
sanity check data. Thus, in your main directory you should have the
following:
LDOCE (directory)
BigMac (directory)
WebReader (directory)
sanity-data (directory)
sentiment.pl
analysis.txt
algorithm.txt
Of course you may have other supporting files, documentation, and so
forth. The above is the the minimum expectation, and the required
organization and naming convention. Please carefully test the unpacking
and installation of your systems. If I can't unpack, install your modules,
and then run sentiment.pl immediately, your team will lose a significant
number of points.
You must also submit all of the data that you have cached from your web
searches such that I can run your programs using that rather than having
to actually redo the web searches. You must submit this in such a way
that your program can use it automatically without my having to do or
configure anything.
Make sure you team name, individual names, date, etc. are included in your
source code and on all of your submissions. You are encouraged to
submit your modules to the CPAN archive, and I will post your systems on
the class web page after the semester, so please provide appropriate
info about authors, copyrights, distribution, etc.
This is a team assignment. You are strongly advised to divide up the work
of the project into tasks that can be carried out in parallel by various
team members. All team members should be acknowledged in the comments,
etc. and all teammates will receive the same grade. Do not work with
other teams. Each team should operate independent of all the other teams.
Make your own decisions as a team and do not be influenced by the
decisions of other teams if you happen to hear of them accidentally. You
are free to work with your teammates as closely as is necessary.
by:
Ted Pedersen
- tpederse@umn.edu