CS 8761 Natural Language Processing - Fall 2004 - Automated Essay
Grading
Final Project - Final version, due Wed, Dec 22, before noon
This may be revised in response to your questions. Last update Thu
Dec 9, 4pm
Objectives
To design, implement and deploy an essay grading system. Your final
version should incorporate all of my previous comments on your alpha
and beta versions, plus what is described below.
This system should be web based, and include support for
all of the following four components to your system:
- gibberish detection
- irrelevance measure
- identify statements of fact
- check statements of fact for accuracy
You will submit your system such that I can install and use it, and you
will also have a web site up and running that can be used. Please make
sure that your installation documentation is complete, and includes
instructions on how to install your web based interface. I should be
able to install and run your web based interface with minimum effort
(5-10 minutes at most) by following your instructions. If I am unable
to do this, your project will not receive full credit.
Specification
Your system should have a simple web interface that provides the user
with a prompt (i.e, question) to answer. This question should be randomly
(and automatically) selected from a file of possible prompts that you
provide. The user should enter their essay, and then submit it for
grading. The system should respond relatively quickly and be interactive.
Make sure that your system has both a user response and a more detailed
trace or diagnostic response. You
should develop a scoring mechanism for your essays that is based
on information collected from the four components. In addition, make sure
you keep a log of activity on your system, that should include the user
identify (via ip address, date, time), the essay written, and the system
feedback. Each essay should result in a separate log file. This log file
should contain enough information to allow us to see exactly what each
component is doing, and how it is making the decisions that it does.
Thus, to summarize, your beta version should also include the following
system-level features:
- automatic selection and display of prompt
- option for user or diagnostic display
- logging of user activity, including identify info, essay, and
feedback. Each essay should result in a separate log file that is easy to
access and read.
Your system should be submitted "pre-trained", so that I can begin
using it immediately. In other words, if you do any calculations based
on large corpora, you should have already done those computations and
provide me with a data file that includes those results. Of course there
may be some dynamic computation required, and that is fine. For example,
you may look up information on the Web based on the users input. However,
if you use an LSA style approach, please create the co-occurrence matrix
ahead of time, and include that in your distribution.
It is up to your team to decide how you will approach each of these
problems. You may certainly use ideas from the published literature,
just make sure to acknowledge this in your documentation (described
below) by providing citations and references.
Improvements upon Beta Version
Based on my review of your beta systems, I am asking that each team
address the following issues in their final systems.
- Logging - all teams need to improve their logging or diagnostic
information. It should be sufficiently detailed to allow me to see
exactly what information is going into making a decision. This might
turn out to be quite a bit of information, so please make a separate
log file for each essay. I found a few teams that have cumulative log
files, and that gets a bit difficult to work with.
- Implementation Documentation - be more specific with respect to
describing your methods. If you use a list of words to identify fact and
opinions, give that list of words. If you have trained an LSA
co-occurrence matrix, give the dimensions of that matrix. If you have used
SVD, describe how you have set the k values. If you have used any sort of
ratio or scoring method, make sure to give the exact algorithm.
- In your final report, include a section that describes how you arrive
at your final score of 0-6. Give the exact formula, algorithm that leads
you to that score.
- Evaluation - all teams need to "beef up" their evaluation. How do you
show that your system is working as advertised. I'd like to see evaluation
of each component (test cases where you know the right answer, and then
you show that your system handles them or not) and also overall essay
grading. In the latter case, I'd like to see you take 5 to 10 previously
scored essays, and compare your systems score for each of those to the
existing essays.
- Related work/Introduction - extend the related work discussion to
include at least 2 systems that are not PEG, Writer's Workbench, IEA,
E-rater/Criterion. In other words, identify and describe two essay grading
systems that we have not talked about in the class already. Make sure to
provide complete references to original papers or articles that describe
each of these systems. Make sure these references are complete and
formatted in a standard fashion. You can use the bibliography in our
textbook as an example of the kind of information that should be included
in a reference. References that consist purely of a URL are to be
discouraged.
- Proof reading - make sure to proof read your final reports carefully.
Check the spelling of names, and make sure that you get the names of
historically important systems correct. Also be on the lookout for context
sensitive spelling mistakes - these are real words that are still
wrong. A common example seems to be ratio, which is often misspelled as
ration.
- Test installation instructions - most of the installation
instructions appear to be incomplete or possibly confusing. Please have
one of your teammates work through the installation as described and
actually install and run your system on a "fresh" system. This is the only
way to make sure your installation instructions are complete. I will be
installing your final systems on my ~/www space, and if I am unable to do
so it will hurt your team score.
Installation
The installation of your system should simple, and follow the standard "4
step Perl" install. These steps consist of the following:
perl Makefile.PL PREFIX=/home/cs/tpederse etc.
make
make test
make install
Note that if you rely on existing tools (like the Brill Part of Speech
Tagger, WordNet, etc.) you can simply provide me with detailed
instructions about where I can find those tools, and how to install them.
Do not assume that I know how to install those, and do not assume that I
already have them available. Also, do not simply refer me to the
instructions in that package, please provide a concise set of instructions
that includes any and all tools that might need to be available.
Note that the PREFIX variable indicates that I will not have supervisor
access, and I will install into my own personal directory. It is fine if
there are other directives required for Makefile.PL, simply indicate what
they are in your INSTALL file.
Your distribution should include a plain text file named INSTALL that
provides detailed instructions on how to install your system. This file
should be in your top level directory. You should
assume that the user of your system only has this documentation
available, and is not an expert in system administration. Thus, if any
other tools need to be installe, paths need to be set, Makefile.PL
variables need to be set, etc. please provide detailed instructions
about how to do that. Assume that your user does not have supervisor
access! Please revise these instructions based on actual tests you
run with each other installing the complete system, to make sure
that a new user can successfully install your system.
Documentation
Your distribution should include a file named README.pod that is written
in perldoc that describes your overall system. This should be in
your top level directory.
You should assume that the user has no specialized knowledge of automated
essay grading. Thus, your README.pod should begin with a general
introduction to the problem of automated essay grading. This should
include a brief description of the history and related work in this area,
and it should be written in the style and form of a related work
section in a thesis or thesis proposal. Please work on making your writing
more formal, and pay particular attention to the introductory remarks that
motivate why automated essay scoring is both feasible and useful. In
addition, continue to develop and expand your historical review. Do not
limit this discussion to systems that have been mentioned in class or in
the readings already assigned, try and identify new systems and background
papers. Make sure you provide references to each of the systems that you
discuss. This is so that your reader will know where to go to obtain
additional information about a system if they so desire.
You should then introduce each of the four problems described above. In
other words, do not assume that the user will know what gibberish
detection, or relevance measurement is. Provide examples of text that
would "trigger" your system to provide guidance to the essay writer. You
should do this for all four systems. Make sure that you extend your
discussion to include additional examples that illustrate the specific
issues that you encounter when you address these problems. Provide
examples of problem cases (where it's not clear what you should do) as
well as more obvious cases. Also, you should start to discuss the
interaction between the problems represented by these components. For
example, does gibberish have any effect on fact checking, or does
relevance have an impact on gibberish detection. In other words, while you
have separate components for each problem, discuss in general terms how
they might interact with each other.
Then, you should describe the specific approach that you are taking for
your four solutions, and also describe your overall plan for each
other components. You should clearly indicate what possible approaches you
are considering for each component, and also describe who is going to do
what. For all of the modules, make sure you discuss the evaluation of
the module from alpha to beta to final versions, and what you have
learned and observed about each along the way.
You should address the issue of how you will evaluate your system. Make
sure to follow the guidelines above, and feel free to add onto that.
The important thing is to demonstrate that your individual components
are working, and that your overall scoring mechanism is reasonable.
Finally, compare and contrast your proposed approach with existing
techniques, and clearly credit any publications, systems, etc. that might
have given you ideas.
To summarize, your README.pod should consist of the following sections:
- INTRODUCTION TO AUTOMATED ESSAY GRADING (motivations, history, and
related work. provide references in standard form to previous work. you
can use the bibliography in our textbook as an example of what your
references should consist of.)
- OBJECTIVE (describe each problem /component that we hope to address,
and give examples both of the problem and the types of situations you
hope to be able to handle)
- IMPLEMENTATION OF APPROACHES (describe in detail the methods
you have implemented in this release for all four components. make sure
to address how your alpha and beta versions have been improved upon.)
- OVERALL PROPOSED SOLUTION (describe your solution for each of
the four solutions, and indicate who is going to do what)
- SCORING METHOD (describe how you arrive at a 0-6 score for an essay).
- EVALUATION (describe how showed that your method is
actually working as intended, and providing meaningful scores to
users.)
- RELATION TO PREVIOUS WORK (specifically describe how your system
relates to others that already exist, and make sure to credit where your
ideas come from.)
Submission Guidelines
You should package your system as a single compressed tar file that is
named as TEAMNAME-1.00.tar.gz. This should include all of the code
needed to run your system, and the README.pod and INSTALL
files as described. If you are using CPAN modules in your system, make
sure those dependencies are described in your Makefile.PL, and I will
obtain those modules via CPAN. The same is true of other outside packages.
If they are not available via CPAN (such as the Brill Tagger, WordNet)
provide detailed instructions as to where I can find those and how to
install them. You do not need to include those in your distribution.
Your system should unpack into a directory named
as TEAMNAME-1.00. TEAMNAME should be your assigned team name, written in
all capital letters. If there are embedded spaces in your name (e.g.,
BOCA JUNIORS) you should replace the space with a -, to result in the
name (e.g., BOCA-JUNIORS-1.00).
If you are using sourceforge, upload your system and I will download
from there. Please note that sourceforge provides a time stamp when
you upload. In your README, please prominently provide a URL where I
can run your system. Please note that I will also install it on my own
system as well.
Finally, make sure that you specifically address the issue of "terms
of use". In
the end, we will be posting your code on the Web, and you should
specifically address the issue of how the code can be used and
distributed. One option is the
GNU
CopyLeft. You are free to
investigate other licensing terms, but you must specifically address
this issue and follow the accepted standards for including licensing
information in your code and documentation. Also make sure to clearly
identify the authors of individual programs, and provide contact
information. If one person does all the work for a particular component,
then their name should be the only one on that particular component. In
cases where the effort is shared, then all team members should be
credited.
Policies
This is a team assignment. You are strongly advised to divide up the
work of the project into tasks that can be carried out in parallel by
various team members. All team members should be acknowledged in the
comments, etc. and all teammates will receive the same grade. Do not work
with other teams, and do not discuss your approach with other
teams. Each team should operate independent of all the other
teams. Make your own decisions as a team and do not be influenced by
the decisions of other teams if you happen to hear of them
accidentally.
by:
Ted Pedersen
- tpederse@umn.edu