Ted Pedersen - CS 8761 Natural Language Processing

CS 8761 Natural Language Processing - Fall 2002 - MRD + Web => Corpus

Final Project - Stage III - Sentiment Classification

This may be revised in response to your questions. Last update Wed Dec 11 10:00 am

PLEASE NOTE: Do not collaborate outside your team.

EACH TEAM IS EXPECTED TO WORK INDEPENDENTLY. DO NOT DISCUSS YOUR PROJECT WITH ANYONE OUTSIDE OF YOUR TEAM. DO NOT SHARE RESOURCES OUTSIDE OF YOUR TEAM. THIS INCLUDES CODE, PAPERS, IDEAS, EMAIL, WEB SITES, ETC.

IF TWO TEAMS ARE FOUND TO HAVE DELIBERATELY SHARED IDEAS, RESOURCES, ETC. THEN ALL TEAM MEMBERS WILL RECEIVE A ZERO ON THIS PROJECT.

Objectives

Invent an algorithm to perform sentiment classification. Implement and evaluate it. Your algorithm must utilize a combination of corpus based information from the World Wide Web and a machine readable dictionary.

Specification

You must develop a solution to the sentiment classification problem with your teammates. In particular we want to automatically classify reviews as being positive or negative simply by referring to their content and without referring to the "summary judgment" of the reviewer.

Your team must develop one approach that employs the World Wide Web, LDOCE, AND Big Mac. All of these resources must be present in your solution. You may only access these resources through your Stage I and Stage II interfaces, which you are free to improve during the remainder of this project. All the data used by your algorithm must come through these interfaces. You should not encode any of your intuitions in your algorithm. Your program should be completely "data driven", where that data comes from the WWW and your MRDS.

As sources of inspiration, I have provided you with three papers: one by Niwa and Nitta (COLING-1994), one by Pang, et. al. (EMNLP-2002) and one by Turney (ACL-2002). These are given simply provided to give you ideas of what other people have done with this and similar problems, and provide some additional background on the problem. You should not simply attempt to re-implement one of these techniques. I would like your team to develop a new approach that combines a machine readable dictionary with corpus information from the World Wide Web! You are welcome to read up on the literature of "sentiment classification". This is an increasingly popular area of research so you can find related work other than the above.

The data employed by Pang, et. al. (EMNLP-2002) is available here. I have also put that data in our CS8761 directory as movie-data. We will use that as part of our evaluation.

In addition, each team must collect some review data from the Internet. Each team should locate at least 100 reviews, approximately 50 negative and 50 positive for something other than movies! Your reviews should be from a single domain (restaurants, cars, etc...). Each review should have at least 200 word tokens, and be downloaded and stored in its own file within a directory structure identical to the movie-data. This just means that positive and negative examples should be in different directories (pos and neg).

Please note that you should collect this data manually, and not attempt to automatically locate this data. Your Stage II tool is not appropriate for this task. Also, make sure you find reviews where there is a clearly identifiable positive or negative "summary" from the reviewer. Your algorithm must *not* use this summary judgment as a part of it's decision process. We just want it for evaluation purposes.

The team collected review data is now available here.

You must use your modules from Stage I and II as your only interfaces to the the World Wide Web, LDOCE, and Big Mac. These interfaces should be your only sources of data. Do not draw upon data from other sources. In particular, do not take the reviews you are going to classify as a source of data. In other words, we want to avoid a supervised learning technique such as we employed with word sense disambiguation.

Your team will also need to submit final versions of your Stage I and Stage II submissions. I expect that they will be "CPAN ready". Please remember that any data your solution requires must come through these interfaces!

Your sentiment classification program should use your stage I and stage II modules, and should run as follows:

sentiment.pl DIRECTORY

where DIRECTORY is a directory of reviews where positive and negative reviews are divided into positive (pos) and negative (neg) subdirectories. (Like the movie-data directory is structured). Each review is in a separate file. Your program should write to standard output. In general it should show the file name of each review, it's known classification, and then the judgment of your system. Your system should have THREE judgments: positive, negative, or can't decide. You should then compute precision and recall for your method based on the formulation of these scores in our textbook.

The following is an example of what your output should look like:

../test-data/neg/review1.txt                   negative   negative
../test-data/neg/review2.txt                   negative   positive
../test-data/pos/review2.txt                   positive   positive
../test-data/pos/review2.txt                   positive   undecided
2 1 1
Precision => 0.66
Recall    => 0.50
F-measure => 0.57

The values 2 1 1 indicate that 2 reviews were classified correctly, 1 was classified incorrectly, and that 1 was not classified (undecided).

You should also produce a trace file (trace.txt) that "explains" why your system reached the judgment it did for a particular review. The explanation can just be whatever values or facts that your algorithm used for reaching a particular conclusion. Make sure that your trace clearly indicates which file is being processed, and attempt to show the evidence that is being gathered to make a decision about that review.

Deadlines!

Friday, December 6, 5pm. Written description of your algorithm AND a beta implementation.
Monday, December 9, 5pm. 100 reviews from each team, formatted like movie-data. Clean up the data as much as is feasible, but it need not be perfect.
Friday, December 13, 5pm. Final versions of your Stage I and Stage II modules. These should be CPAN ready, and each module should include 20 test cases.
Monday, December 16, 4-6pm. Make formal presentation to class during final exam period where you describe the algorithm, show a worked example, discuss your experimental results, and explain why your method works (or doesn't) and what you might have done differently to improve the results. Please prepare a powerpoint presentation that can be run on the computer or shown via overheads. Each team will have 20 minutes. You may designate one person to give the presentation, of you may share that responsibility amongst you in some way. That's up to you.
Tuesday, December 17, noon. Final version of your project.

Documentation

There are three pieces of documentation that are required for your sentiment program.

Make sure that sentiment.pl is commented to the extent that I can clearly see when you are accessing different sources of information (when is it using LDOCE, BigMac and WebReader) and that I can follow the flow of the algorithm somewhat as well.
Provide an updated version of algorithm.txt that explains your final algorithm in detail.
Provide a detailed report in a file called analysis.txt, which should describe several experiments with sentiment.pl. You should conduct experiments with at least four sources of data:
- review data your team collected
- review data from one other team
- review data from Pang et. al. (movie-data, at least 50 from pos and 50 from neg)
- Create a sanity-data review file where the "reviews" are not actually reviews but are rather something else completely. You might consider chopping up text from Project Gutenberg to create such data. Please turn this data in with your final project submission. You should have at least 50 positive and negative "reviews" in this data.
Report your results for each set of data, and discuss what these results lead you to conclude about the problem and your particular approach. How could you possibly improve this approach? You have considerable latitude in what you discuss, but please make an effort to actually analyze why your method works or does not. Also discuss the amount of time it took your method to carry out these experiments, and how that might be improved.

If possible, you should "decompose" your methods such that you can report results using just one kind of information. For example, how well does your method fare using just web data? Or just LDOCE? Or just BigMac? If your method totally integrates these three sources of information then of course you can't decompose. But if you have three fairly separate techniques that can be separated in this way, I will expect that you carry out this kind of analysis and report results for the different pieces as well as the whole approach.

Submission Guidelines

When you submit your final project on Tue Dec 17, submit all stages of the project. Thus, you may still make small changes to your Stage I and Stage II modules, but in general my expectation is that these will be "fine tuning" issues and not major changes in functionality over the version you submitted on Fri Dec 13.

Please put your stage I and stage II modules into directories called LDOCE, BigMac, and WebReader. Please use exactly those names. Please put those within a main directory named after your team (please use Alianza, Melgar, SportBoys, Cristal, and Minas as those names). This main directory should also include your sentiment.pl program plus your various reports. It should also include a directory called sanity-data that includes the sanity check data. Thus, in your main directory you should have the following:

LDOCE (directory)
BigMac (directory)
WebReader (directory)
sanity-data (directory)
sentiment.pl
analysis.txt
algorithm.txt

Of course you may have other supporting files, documentation, and so forth. The above is the the minimum expectation, and the required organization and naming convention. Please carefully test the unpacking and installation of your systems. If I can't unpack, install your modules, and then run sentiment.pl immediately, your team will lose a significant number of points.

You must also submit all of the data that you have cached from your web searches such that I can run your programs using that rather than having to actually redo the web searches. You must submit this in such a way that your program can use it automatically without my having to do or configure anything.

Make sure you team name, individual names, date, etc. are included in your source code and on all of your submissions. You are encouraged to submit your modules to the CPAN archive, and I will post your systems on the class web page after the semester, so please provide appropriate info about authors, copyrights, distribution, etc.

This is a team assignment. You are strongly advised to divide up the work of the project into tasks that can be carried out in parallel by various team members. All team members should be acknowledged in the comments, etc. and all teammates will receive the same grade. Do not work with other teams. Each team should operate independent of all the other teams. Make your own decisions as a team and do not be influenced by the decisions of other teams if you happen to hear of them accidentally. You are free to work with your teammates as closely as is necessary.

by: Ted Pedersen - tpederse@umn.edu