The Project can be accessed online at http://egal.sourceforge.net
Developers : Ajit Datar, Nagendra Doddapaneni, Sudip Khanna, Varsha Kodali and Archna Yadav.
As it is said, the three R's of education are Reading, wRiting and aRithmetic. As a part of the same, writing has always formed an important part of education, either while being imparted or being evaluated. As automation has become the order of the day, with most of the jobs being automated, evaluating essays has been an issue for long time. Since late 1960s, systems have been developed to evaluate essays automatically. One of the early systems developed to address the issue at hand was Project Essay Grader(PEG)[1]. This system was not accepted widely, because of the fact that most of the factors that this system considered were indirect, like number of commas, number of prepositions and number of uncommon words. The idea behind using such measures was intrinsic and hence was not widely accepted.
After PEG, in the 1980s there was an essay evaluation system called
Writer's Work Bench tool(WWB)
which helped giving feedback for writers
about various aspects such as spelling, diction and readability. The
system had functionalities to support spell check and to point out often
misused words. But because WWB did not take care of semantic analysis, it
was limited in scope - in terms of evaluation. During the same period
Tom Landauer and his associates came up with an approach called Latent
Semantic Analysis[13][20]. This Latent Semantic Analysis (LSA) based
techniques use a ``bag of words'' approach, wherein the word order is not
considered important, but similarity between words is and is measured
using a co-occurrence matrix and singular value decomposition. Tom Landuer
and his associates furthered research into the approach and developed
the Intelligent Essay Assessor(IEA)[14] system.
Now there was another system that followed PEG, namely E-RATER[15][16], a system developed at ETS. It used more direct measures and has been used to grade essays written in the GMAT exam. The measures used by E-RATER are more comprehensive and include the document structure. Thus the students' response can be measured for similarity with a training set of essays and based on this similarity value, given a score on a point scale from 1-6,
Later, in 1999 Christie architectured a new essay grading system SEAR (Schema Extract Analyse and Report)[17] which was based on a new parameter, the style of the essay rather than content. It required an initial training and calibration on a set of essays. Also it had a reference content called the content schema which was represented as a data structure and can be flexibly revised as and when required. SEAR had inspired others to consider the style of the essay, though it had its pros and cons. In the following year, three students Ming, Mikhailov and Kuan at NGEE ANN polytechnic came up with an essay grading system called Intelligent Essay Marking Systems (IEMS)[18], which was based on the Pattern Indexing Neural Network(Indextron), a clustering algorithm that can be implemented over a neural network. IEMS was accoladed for its instant feedback to the student in a way by which students can learn where they had performed well and also the specifics of their mistakes. After three years of research and development, Mitchell, Russel, Broomhead, and Aldridge in 2002 came up with a software called Automark[19] intended for marking answers to open-ended questions. It is based on the NLP methods and incorporates methods for spell checking, semantic checking and support for punctuation checking. It looks for specific content in the free-text answers. Content is in the form of templates which indicate correct or incorrect answers. Another step in the direction of automatic text assessment.
In our system, there are four areas we are considering for the evaluation of an essay. They are Gibberish Detection, Irrelevance Measure, Identifying Statements of Fact and Checking statements of fact for their accuracy.
Let us discuss each of these modules one after the other. To start off, there can be two types of gibberish sentences. Syntactic gibberish sentences and semantic gibberish sentences. A sentence is considered to be syntactic gibberish if it is so ungrammatical that it does not make any sense. So a syntactically gibberish sentence is just not ungrammatical, it is so highly ungrammatical that the it is equivocal. Let us consider the following sentence ``I go the market.'' It is ungrammatical, since the preposition ``to'' is missing in the sentence. Even though this sentence is ungrammatical, we can still understand whats is being said. Now consider the sentence ``At flying no market ran.'' This too is ungrammatical but it does not make any sense. Hence, it is considered to be syntactic gibberish. Now let us consider what semantic gibberish sentences are. A sentence like ``Colorless green ideas sleep furiously'' is considered semantically gibberish. Such sentences are grammatical but do not have any meaning. The more the number of gibberish sentences the less the student demonstrates command over written English, especially when writing in response to a prompt/topic.
Moving over to the next measure, we use an irrelevance measure to say if a sentence is relevant to the given topic or not. Given a topic/prompt for the essay, a sentence from the student's response is considered to be irrelevant if the sentence does not relate to the topic. Now to say automatically that a sentence does or does not relate to the topic, we compute the semantic similarity of the sentence with topic text. We use WordNet to find this similarity. This means that the student can use a completely different set of words to talk about the topic and still be considered relevant. For example say the prompt asks the student to write about the ``Importance of class room teaching versus learning from home'' and the student writes ``Education is very important for one to succeed in life.'' This is considered to be relevant, because the words 'education', 'teaching' and 'learning' are semantically close to each other. Now if the student responds by writing ``The sun is the biggest star in the universe'', then the sentence is irrelevant, because it does not talk about the topic. The more the number of irrelevant sentences the less the student shows an understanding of the topic.
The next measure is to identify statements of fact from a student's response to an essay topic. A sentence is considered to be a fact against an opinion, if the following four properties hold good (i) Information presented Unique and One of a kind (ii) Information presented Concrete (iii) Information presented a Statement or Proposition and (iv) Information presented an association between more than one concept[2] So if the sentence contains such information, we can identify the statement to be a fact. Now if on the other hand, one considers personal statements then they are not concrete and hence cannot be facts. Such personal statements would be opinions. Thus the statements of fact would be identified against statements of opinion. Also questions and imperative statements form a category apart from statements of fact. If a sentence contains comparative adjectives like 'smarter' or superlative adjectives like 'smartest', or if the sentence contains comparative adverbs like 'faster' or superlative adverbs like 'fastest' then the sentence is not a statement of fact, because such a sentence would not be concrete. Also, sentences about future are not a proposition, as there is an element of uncertainty related with future, and hence such sentences cannot be statements of fact. Let us consider an example. If the topic was the same as given in the previous section about teaching, and if the student response was ``Learning from home started in USA from 1965'' is considered to be a statement of fact, irrespective of whether the statement is relevant or not. The statement contains information which is unique(1965), concrete ('something starting'), and has no words that can be interpreted in more than one way in different contexts. Now if there was a sentence like ``Learning in class rooms is good'', it is not a statement of fact, but a statement of opinion, because what seems to be 'good' to one person, might not be the same for another. Thus personal statements cannot be considered as statements of fact. Thus the more such facts are stated, the better we evaluate what the student envisions about the topic at hand, by relating the topic with concrete statements i.e. facts.
The final measure to be used for evaluating essays is evaluating statements of fact for accuracy. Now that we have identified the statements of fact, we check to see if those facts are accurate and if they are, they contribute to a boost in the score. Let us consider the topic from the last measure about learning. If the student writes ``Class room teaching started in 1965 in the USA'' and say in fact that it started in 2000. Then although the statement is a fact, it is not accurate. Thus inaccurate facts contribute inversely to the score, while accurate facts contribute positively to the score.
The four measures implemented are Gibberish Detection, Irrelevance Measurement, Identifying statements of fact and checking for Accuracy of Statements of Fact. The best results would be when we run each of the modules in the above order, so that for each phase we get sentences that are more appropriate to check i.e. if a sentence is gibberish, then there is no point in checking to see if it is relevant or not, and if a sentence is irrelevant then there is no point in checking to see if it is a statement of fact or not.
Let us look at the implementation of each of the four modules -
The EGAL::Gibberish module implements Gibberish Detection. The method implemented is
isGibberish()
- This method determines if a given sentence is
gibberish or not. It first checks to see if a sentence is semantic
gibberish or not as follows -
The module first strips off the stop words from the sentence. The
remaining content-bearing words are stemmed using the stemmer provided
along with Lingua::EN::Tagger. Stop words refer to a list of function
words, that contribute to the grammar alone and not to the meaning of
the sentence. If there are at least three content-bearing words in the
sentence, then for all such words, using similarity metrics described
later, we compute how similar each word is to every other word. Thus if
we have N words in the sentence, we end up with a NxN matrix. Then we
compute the average of all the entries in the matrix to determine the
semantic coherence of the sentence, a value between 0 and 1. The
compliment of this value scaled between 0 and 100 is the percentage
gibberishness of the sentence. If the scaled score is above an
empirically decided threshold of 90%, then the sentence is flagged as
semantic gibberish.
Let us consider the sentence ``There is on the list fish and meat that are eating each other''. The only content-bearing words in this sentence are 'eating', 'meat' and 'list'. The semantic similarities between these words are eating-meat - 0.1581 eating-list - 0 meat-list - 0
This gives us a value of 0.0527 for the semantic coherence of this sentence. Percentage gibberishness is calculated as (1-0.0527)*100, i.e. 95.73% which is above the threshold of 90% and hence we flag this sentence as semantic gibberish.The threshold we have used has been experimentally derived. After working on a list of sentences which were assessed by humans as gibberish/non-gibberish, the system was tuned to be consistent with the aforementioned judgments, by setting the threshold accordingly. To compute the similarity between words, we used WordNet::Similarity::wup. The measure has been used for words found in WordNet. For those words not found in the knowledge base, a similarity value of 0 is used, which results in a reduction of semantic coherence and therefore an increase in gibberishness, which is complimentary to semantic coherence. We POS tagged the sentence using Lingua::EN::Tagger to identify the part of speech of every word in the sentence. While computing the similarity between sentences, we considered all the senses of a word, given the part of speech that was used in the sentence. Now as we know the part of speech of a word, we would be able to distinguish the sense of the word ``I'' as used in the sentence ``I am going there.'', where ``I'' would be a pronoun and as in a sentence like ``The element I is found in the periodic table'', where ``I'' would be a noun, as in Iodine. This would enable us to handle some of the surprising words found in WordNet, based on the part of speech of that word in the sentence.
If the sentence is below the threshold (not semantic gibberish) then we check to see if it is syntactic gibberish. To determine if a sentence is syntactic gibberish, we do the following - If the percentage of unknown words and unused links (score) in the sentence is more than an empirical threshold then we say that this sentence is syntactic gibberish. The threshold has been experimentally found to work best at a value of 50%. A syntactic gibberish sentence would have lot of unused links according to Lingua::LinkParser, a grammar parser used to parse sentences. Based on this, if the score is 0 then the sentence is grammatical. If it is a relatively small value it is ungrammatical but not gibberish and if it is high, then the sentence is considered to be syntactic gibberish. This method takes in a sentence from the students response for a topic as input and returns an array, with the zero index contains a 1 if the sentence is gibberish and a 0 otherwise. At index 1, the array contains the gibberish score associated with the sentence and at index 2, the array contains detailed trace of execution.
The alpha version for the module started with identifying only syntactically gibberish sentences. To flag a sentence as syntactic gibberish, we considered an absolute value of the number of unknown words and unused links. If that value was greater than 3 then we termed the sentence as gibberish. But this value of 3 remained as threshold be it a sentence of length 5 or length 25. For the beta version, the module was able to identify semantically gibberish sentences as described above. An improvement over beta version, has been achieved in the final version, with modification of the absolute threshold for syntactic gibberish sentences. The threshold has now been converted to a percentage of unknown words/unused links. Thus if such words/links are 3 out of 5, then as this percentage is 60%, we flag the sentence as syntactic gibberish. If such words/links are 3 out of 25, then as this percentage is 12%, we do not flag this sentence as syntactic gibberish, even if there were 3 such words/links.
The EGAL::Relevance module implements Relevance Measure. The method implemented is
isIrrelevant()
- This method takes in a sentence from the students'
response as input. It then computes the relevance of the sentence with
the topic/prompt. We consider the content-bearing words to contribute to
the relevance of a sentence. Now we consider each of the content-bearing
words from the input sentence and we compare each such word with every
word in the topic. So for every word in the topic, we find the maximum
similarity that such a word has with any word in the input sentence,
using WordNet::Similarity::wup. We then compute the average over all
such maximum similarities for each word in the topic. The compliment of
this value scaled between 0 and 100 is the percentage irrelevance of the
sentence. If this score is above an empirically decided threshold then
the sentence is considered to be irrelevant. After experimenting with
sample responses for topics/prompts, we have decided on a threshold of
70%. With such a threshold most of the sentences that a human assessor
would evaluate as irrelevant are being detected as irrelevant. This
method returns an array with its zero index returning a 2 if the sentence
is irrelevant and a 0 otherwise. The array index 1 contains the
irrelevant score between 0 and 100. The array index 2 contains the
detailed execution trace.
The alpha version for the module started with comparing words from the student's response to words in the topic and ``gold standard essay''. This list of ``gold standard essays'' have been written by an expert in the field, for every possible essay topic that the system would handle. This word match also considered synonymous words by using Thesaurus.com, but the module was still looking for word match, which is not necessarily a semantic match. Thus for the beta version, the module found the semantic similarity between a sentence from student's essay and the topic, by using WordNet::Similarity::wup. Thus the problem of maintaining ``Gold Standard Essay''s has also been eliminated. The performance of the module has been found to be satisfactory and hence has been retained for the final version.
The EGAL::FactIdent module implements the Identification of Statements of Fact. The methods implemented are
fill_stop_hash - this method fills a hash with value words that say that a sentence containing such words is an opinion, not a fact.
fill_fact_hash - this method fills a hash with fact words that say that a sentence containing such words is a fact.
which are called as a part of the constructor.
isFact - this method takes in a sentence from the students response for a topic/prompt. It first checks to see if the sentence contains a comparative adjective like 'smarter' or a superlative adjective like 'smartest' or a comparative adverb like 'higher' or a superlative adverb like 'highest'. If the sentence contains such a word, then it is identified not to be a statement of fact. Next it is checked to see if the sentence is in any form of future tense. If it is, then the sentence is identified not to be a statement of fact. Then it is checked to see if the sentence is a question and if it is then it is not a statement, let alone a statement of fact. Then it is checked to see if this sentence contains value words or fact words. If it contains such words, then the sentence is identified to be a statement of opinion or statement of fact respectively. Now, that if a sentence contains value words, it makes it a statement of opinion, was given by John Langan[6] in a discussion about distinguishing facts from opinions. Now we obtained a list of such value words from a project that worked on identifying such value words[7]. The list of value words and fact words have been put up on links [21] and [22] respectively. Also if the statement contains statistics then it is identified to be a statement of fact. Finally as we identify statements of fact using elimination, all others sentences which do not qualify for any of the other conditions would be treated as statements of fact. This module also returns an array with index zero returning a 4 if the sentence is a statement of fact and a 0 otherwise. At index 1, the array returns always returns a 0 as there is no concept of a score associated with identifying statements of fact and at index 2 the array returns a message indicating if the sentence is a statement of fact or not. If not, the reason as to why the sentence is not a statement of fact is sent here.
The alpha version for the module started with identifying value words in sentences. Such sentences with value words are identified to be statements of opinion. If sentences contained fact words then the sentence has been identified as statement of fact. Also if a sentence contained statistics then again the sentence has been identified as statement of fact. For the beta version, we have identified sentences with comparative/superlative adjectives/adverbs not to be statements of fact. Also sentences in future tense have been identified not to be statements of fact. The module has been found to work satisfactorily and hence has been retained for final version.
The EGAL::FactCheck module checks the statements of facts for accuracy. The method implemented is
check_fact()
- This method takes in a sentence from the students
response to a prompt/topic. A linkParser[4] tree for the above sentence
is built. Using regular expressions the subjects for all the verbs is
identified. A Google query is then constructed to search en.wikipedia.org
[3] using the subjects and any proper nouns in the sentence. If there
are no proper nouns, all noun phrases are used. The first Google result
is taken and full text associated with this Wikipedia entry is retrieved.
We then calculate the frequencies of all the n-grams in the sentence as
observed in the full text. We have considered n-grams of at most length 3.
and applied Witten-Bell smoothing to the obtained frequency list. A
confidence measure is calculated for each sentence. The measure takes in
some arguments that we shall now see. 'n' refers to the number of words in
the n-gram. 'count' refers to the number of times the n-gram has occurred
in the wikitext/sentence. 'N' refers to the maximum n-gram length
considered which is in our case is 3. Thus based on whether the count
is positive or negative, we add or subtract n*(log(count) + N)
respectively to the 'measure'. This is done for each of the n-grams in
the sentence. Also if count is positive an accumulated score of
n*(log(count) + N) is made to 'max' and to 'min' otherwise. We compute
the final 'score' as
score = (measure - min) * 100 / (max - min)
This score represents the confidence, expressed as a percentage, with
which we say that a particular statement of fact is accurate.
The module started with the beta version for the project. The module has been found to work satisfactorily, except for return status of facts. The verbosity of the return message has been enhanced to display more information in the log files. With this modification, the module was ready for the final version.
Following is a list of tasks and group member who worked on the same -
Gibberish Detection - Sudip Khanna and Ajit Datar
Irrelevance Measure - Sudip Khanna
Identifying Statements of Fact - Nagendra Doddapaneni
Checking Accuracy of Statements of Fact - Ajit Datar and Sudip Khanna
Web-Interface - Archana Yadav and Ajit Datar
Installation and Administration - Ajit Datar
Documentation - Nagendra Doddapaneni and Varsha Kodali
Literary Review - Nagendra Doddapaneni and Varsha Kodali
As we have seen in the previous section, for Gibberish Detection - EGAL::Gibberish identified sentences to be semantic gibberish based on the similarity score given by WordNet::Similarity::wup. Then if the percentage gibberishness is more than a particular threshold then the sentence is said to be semantic gibberish else EGAL::Gibberish checks to see if sentences are syntactic gibberish or not based on the number of unused links and unknown words in the sentence. If this count is relatively high, then the sentence is considered to be syntactic gibberish.
Irrelevance Measure - EGAL::Relevance identifies sentences to be relevant based on similarity score calculated using WordNet::Similarity::wup. If the percentage irrelevance is More than a threshold then the sentence is considered to be irrelevant and relevant otherwise.
Identifying Statements of Fact - EGAL::FactIdent identifies sentences to be statements of fact or not based on the occurrence of fact words and stop words respectively. If value words occur in a sentence from the student's response, then the sentence is considered to be an opinion, because value words are mostly used to express personal view. If fact words such as ``first'' occur in the student's response or a number occurs in the sentence which is not gibberish and is relevant, then the sentence is considered to be a statement of fact. Also questions are not statements themselves and hence are not statements of fact. Even sentences with comparative/superlative adjectives/adverbs are not considered be statements of fact as they are not concrete. Also if the sentence is in any form of future tense then again the sentence is not a statement of fact as it not a proposition.
Checking Statement of Fact for Accuracy - EGAL::FactCheck determines the accuracy of the statements of facts. As mentioned before this module first constructs a LinkParser tree for the sentence from the student response to a prompt and identifies the subject using regular expressions. Then a Google query is constructed and fired to en.wikipedia.org using the subject and proper nouns/noun phrases. Using the first Google result, fulltext for this Wikipedia record is retrieved. Now to check the accuracy of the statement, we develop a score based on the statistics obtained from considering all n-grams from the sentence and applying Witten-Bell smoothing to the frequency of such n-grams as observed in the wikitext. Presently n-grams of length upto 3 are being considered. Thus using this score, we can say that the given statement is a statement of fact with score% confidence.
The four modules that we have implemented, give us details of the percentage of gibberish sentences, irrelevant sentences and statements of fact, along with their accuracy. With this information in hand, to develop a scoring mechanism consistent with as scored by a humam grader, would be difficult. Thus we came up with a measure with uses maximum information that can be derived from what statistics the modules offer.
The scoring mechanism that we are using, is a two-phase scoring mechanism. During the initial phase, we consider two bins of sentences. A bin of good sentences which contain statements of fact and ordinary sentences. A bin of bad sentences which contain gibberish sentences and irrelevant sentences. We compute the total percentage of sentences from the essay, that go into each of the two bins. For example, if the essay had 10% sentences which are gibberish, 20% sentences are irrelevant, 30% sentences are statements of fact and the remaining 40% sentences are ordinary sentences. Then the total percentage of sentences in the good bin is 70%, while the total percentage of sentences in the bad bin is 30%. Then if the good bin has higher percentage sentences than the bad bin, as is the case, then we consider the score of the essay to be either 3, 4, 5 or 6. The higher the percentage of sentences in the good bin, the higher would be the score. For the given example, a score of say 4 is assigned. Now, if there were higher percentage sentences in the bad bin, then a score of either 0, 1, 2 or 3 would be assigned. Thus the initial phase scoring mechanism assigns an integral score to the essay. Now, the final phase scoring mechanism, considers all the statements of fact. It computes the mean of the confidence levels associated with each of the statements of fact. If the mean is greater than 50, then the score assigned by the intial phase scoring mechanism, is incremented by a fraction. If the mean is less than 50, then instead of a score boost, the score would be penalised. By how much, is dependent on the confidence level of facts. For example, there were two statements of fact in the essay. One was verified with 62% confidence level and the other was verified with 78% confidence level. Then the mean of the confidence levels would be (62+78)/2 which is 70%. Now, since this value is greater than the threshold of 50%, the score of 4, assigned by the initial phase scoring mechanism, is incremented by the final phase scoring mechanism. So say, a score boost of 0.6 is given. Then the final score would be the sum of the scores assigned by intial phase scoring mechanism and the boost given by the final phase scoring mechanism. So for the example that we considered, the final score would be 4.0 + 0.6 = 4.6 . Now this final value is rounded off to the nearest half number. So 4.6 is rounded off to 4.5 . In the final phase mechanism, the value for score boost has been identified to be following an exponential i.e. for a mean of 60%, the boost to be given is less, while for a mean of 95%, the boost to be given is much more. Thus an exponential function [((mean-50)*1.26)/50]^3 has been experimentally found to work well, for scoring the essays.
As a part of evaluation, we have in place modules which check to see if a given sentence is gibberish, if not if its irrelevant and if not if its a statement of fact and if it is then we check for its accuracy. Thus on running the system over the following input -
The following are the sentences that have been successfully identified as gibberish by the EGAL::Gibberish module -
"Karma art have play dance under carpet." as 57.14% syntactic gibberish. "Better after so far observe have peers obviate." as 62.50% syntactic gibberish. "Lush yellow bright grey amongst intelligence over scale pendant." as 88.89% syntactic gibberish. "There is on the list fish and meat that are eating each other." as 94.87% syntactic gibberish. "This is the war of 1969, we intend to discuss in the class tommorow" as 93.73% semantic gibberish. "Most lovely ladies are running under the ocean." has not been identified as semantic gibberish. "There over red hour item clear jumble read." has not been identified as syntactic gibberish. If the topic is " A government is a tremendous burden to business, though a necessary one" and the student response had a statement "India played a match with Pakistan yesterday." then the above statement received a score of 84.13%. Finally if there was a statement like "A government is a governing body in a country" then it is identified as a statement of fact with a score of 85.99 %. A statement "Government is a good source of trouble for businesses." is not gibberish, relevant and not a statement of fact according to the system. The value word 'good' in the sentence made the sentence a statement of opinion and hence was identified not to be a statement of fact.
Thus keeping the scores, assigned by each of the modules ,in view and their identification of sentences, in the respective categories, rightfully, the system has been able to achieve expected results.
Following is a detailed run of the system, with details about some sample essays that the user has given for some prompts used in the system. Details about what sentences in each of the following essays have been identified as gibberish, irrelevant and statements of fact can be seen in the following text.
The following is a response which was graded to be a 3 point essay. Our EGAL system graded the essay to a 3.49 score.
This was for the prompt:
Leisure time is becoming an increasingly rare commodity, largely because technology has failed to achieve its goal of improving our efficiency in our daily pursuits. In your view, how accurate is the statement above? Use relevant reasons and/or examples from your experience, observations, or reading to support your viewpoint.
The user response is :
Picture this, a family sitting down for breakfast. The father at the head of the table asking everyone what their agenda is for the day. Suddenly he looks at his watch, then with a frantic look on his face, he lets out a bellowing roar of I'm late. Every one looks at each other and scrambles to get thier belongings for the day. Five minutes later everyone meets at the family vehicle and files in. The car speeds away and everyone is off to their busy filled day. you would think that with today's technology, the family would be able to sit down together and enjoy breakfast without being rushed, but in todays society this is not the case. It seems like the more we are advanced in technology the more we pack into our schedultes eliminatingfree time. We are trained as children to work as hard as we can, to advance ourseveles in careers or growth and any relaxation could be viewed as laziness by out parents or peers. Though we do have the technology which could enable us to live stress free lives, we choose to use it to our benefit, but instead of taking advantage of our newly created "spare time", we bog ourselves with more work. Let's take the father of this family who is a well known executive at a prominant accounting firm. He is the man that solves all the problems and has all the answers for his company. During his lunch hour he sits and calculates numbers instead of enjoying himslef and relaxing. "No time for rest" is his motto. When his boss says we're going to give you a half day today, he decides to spend it on the golf course discussin work. He has no time for his family and always seems to be found in his office when at home. This is a very unhealthy way of live and could be damaging to the raising of his children. The children pick up patterns at a very young age. Grwoing up we are trained by our parents subcounciously. These children from a very young age are taught that leisure time is wrong. At a young age that children are subjected to little league and ballet, as a detourant of cutting into their parents time. In these activities childrn are pushed to their fullest potential, allowing them to accompish the honor roll, class president, or even valedictorian for there graduating class. It is great that the children have such drive, but without relaxation or leisure time it oculd lead to psychological problems or mental breakdowns. Even though technology has created free or leisure time, we as individuals need to learn to take advantage of it. We have been trained at a very young age always to be busy. When were not working on deadline or have meeting to be at we are often wondering what do we do with ourselves. The fact of the matter is that we do have the technology to make our lives a lot easier, we just need to take advantage of it, if we don't we could end up seriously injured physically, or even more detrminetal psychologically.
-------------------------------------------------- DETAILED DIAGNOSTICS OF THE ESSAY: --------------------------------------------------
Type of Sentences Identified Percentage Gibberish 7.41 Irrelevance 25.93 Facts 18.52 Ordinary 48.15
gibberish sentences were as follows
Sentence It seems like the more we are advanced in technology the more we pack into our schedultes eliminating free time. Points 91.16% Semantic gibberish
Sentence He is the man that solves all the problems and has all the answers for his company. Points 90.30% Semantic gibberish
irrelevant sentences were as follows:
Sentence Picture this, a family sitting down for breakfast. Points 76.86%
Sentence Let's take the father of this family who is a well known executive at a prominant accounting firm. Points 76.08%
Sentence During his lunch hour he sits and calculates numbers instead of enjoying himslef and relaxing. Points 70.35%
Sentence This is a very unhealthy way of live and could be damaging to the raising of his children. Points 70.30%
Sentence Grwoing up we are trained by our parents subcounciously. Points 83.51%
Sentence We have been trained at a very young age always to be busy. Points 76.97%
Sentence When were not working on deadline or have meeting to be at we are often wondering what do we do with ourselves. Points 74.89%
fact sentences were as follows:
Sentence The father at the head of the table asking everyone what their agenda is for the day. Points 61.44% I can say this is a fact with 61.44 percent confidence
Sentence Five minutes later everyone meets at the family vehicle and files in. Points 0.00% I can say this is a fact with 0.00 percent confidence
Sentence The car speeds away and everyone is off to their busy filled day. Points 41.31% I can say this is a fact with 41.31 percent confidence
Sentence "No time for rest" is his motto. Points 62.20% I can say this is a fact with 62.20 percent confidence
Sentence The children pick up patterns at a very young age. Points 27.00% I can say this is a fact with 27.00 percent confidence
ordinary sentences were as follows:
Sentence Suddenly he looks at his watch, then with a frantic look on his face, he lets out a bellowing roar of I'm late. Message This sentence is a statement of fact
Nouns: "face" "roar" "watch" "look" Subjects: "he """ Google query: "he """ "face" "roar" "watch" "look" site:en.wikipedia.org System cannot verify this using the existing knowledge source.
Sentence Every one looks at each other and scrambles to get thier belongings for the day. Message This sentence is a statement of fact
Nouns: "scrambles" "one" "thier" "belongings" "day" Subjects: "scrambles """ Google query: "scrambles """ "scrambles" "one" "thier" "belongings" "day" site:en.wikipedia.org System cannot verify this using the existing knowledge source.
Sentence you would think that with today's technology, the family would be able to sit down together and enjoy breakfast without being rushed, but in todays society this is not the case. Message This sentence is in future tense
Sentence We are trained as children to work as hard as we can, to advance ourseveles in careers or growth and any relaxation could be viewed as laziness by out parents or peers. Message This sentence is in future tense
Sentence Though we do have the technology which could enable us to live stress free lives, we choose to use it to our benefit, but instead of taking advantage of our newly created "spare time", we bog ourselves with more work. Message This sentence has comparitive/superlative adverbs/adjectives
Sentence When his boss says we're going to give you a half day today, he decides to spend it on the golf course discussin work. Message This sentence is in future tense
Sentence He has no time for his family and always seems to be found in his office when at home. Message This sentence contains the value word home
Sentence These children from a very young age are taught that leisure time is wrong. Message This sentence contains the value word leisure
Sentence At a young age that children are subjected to little league and ballet, as a detourant of cutting into their parents time. Message This sentence contains the value word little
Sentence In these activities childrn are pushed to their fullest potential, allowing them to accompish the honor roll, class president, or even valedictorian for there graduating class. Message This sentence has comparitive/superlative adverbs/adjectives
Sentence It is great that the children have such drive, but without relaxation or leisure time it oculd lead to psychological problems or mental breakdowns. Message This sentence contains the value word great
Sentence Even though technology has created free or leisure time, we as individuals need to learn to take advantage of it. Message This sentence contains the value word leisure
Sentence The fact of the matter is that we do have the technology to make our lives a lot easier, we just need to take advantage of it, if we don't we could end up seriously injured physically, or even more detrminetal psychologically. Message This sentence has comparitive/superlative adverbs/adjectives
The next essay was evaluated to be a 4 pointer, and it was evaluated to be a 2.61.
The essay topic was
People often complain that products are not made to last. They feel that making products that wear out fairly quickly wastes both natural and human resources. What they fail to see, however, is that such manufacturing practices keep costs down for the consumer and stimulate demand. Which do you find more compelling: the complaint about products that do not last or the response to it? Explain your position using relevant reasons and/or examples drawn from your own experience, observations, or reading. The user response was
The topic raises the issue of whether, on balance, consumers are damaged or benefited by quality-cutting production methods. Indisputably, many consumer products today are not made to last. Nevertheless, consumers themselves sanction this practice, and they are its ultimate beneficiaries in terms of lower prices, more choices, and a stronger economy. Common sense tells us that sacrificing quality results in a net benefit to consumers and to overall economy. Cutting production corners not only allows a business to reduce a product's retail price, it compels the business to do so, since its competitors will find innovative ways of capturing its market share otherwise. Lower prices stimulates sales, which in turn generate healthy economic activity. Observation also strongly supports this claim. One need only look at successful budget retail stores such as Walmart as evidence that many and perhaps most consumers indeed tend to value price over quality. Do low-quality products waste natural resources? On balance, probably not. Admittedly, to the extent that a product wears out sooner, more material are needed for replacement units. Yet cheaper materials are often synthetics, which conserve natural resources, as in the case of synthetic clothing, dyes and inks, and wood substitutes and composites. Moreover, many synthetics and composites are now actually safer and more durable than their natural counterparts especially in the area of construction materials. Do lower-quality products waste human resources? If by waste we mean use up unnecessarily, the answer is no. Many lower-quality products are machine-made ones that conserve, not waste, human labor for example, machine-stitched or dyed clothing and machine-tooled furniture. Moreover, other machine-made products are actually higher in quality than their man-made counterparts, such as those requiring a precision and consistency that only machines can provide. Finally, many cheaply-made products are manufactured and assembled by the lower-cost Asian and Central American labor force a legion for whom the alternative is unemployment and poverty. In these cases, producing lower-quality products does not waste human resources; to the contrary, it creates productive jobs. In the final analysis, cost-cutting production methods benefit consumers, both in the short-term through lower prices and in the long run by way of economic vitality and increased competition. The claim that producing lower-quality product wastes natural and human resources is specious at best.
-------------------------------------------------- DETAILED DIAGNOSTICS OF THE ESSAY: --------------------------------------------------
Type of Sentences Identified Percentage Gibberish 19.05 Irrelevance 9.52 Facts 14.29 Ordinary 57.14
------------------------------
Percentage of Gibberish Sentences were: 7.41 Percentage of Irrelevant Sentences were: 25.93 Percentage of Fact Sentences were: 18.52 Percentage of Ordinary Sentences were: 48.15 No. of Gibberish Sentences were: 2 No. of Irrelvant Sentences were: 7 No. of Facts Sentences were: 5 No. of Ordinary Sentences were: 13
gibberish sentences were as follows
Sentence It seems like the more we are advanced in technology the more we pack into our schedultes eliminating free time. Points 91.16% Semantic gibberish
Sentence He is the man that solves all the problems and has all the answers for his company. Points 90.30% Semantic gibberish
irrelevant sentences were as follows:
Sentence Picture this, a family sitting down for breakfast. Points 76.86%
Sentence Let's take the father of this family who is a well known executive at a prominant accounting firm. Points 76.08%
Sentence During his lunch hour he sits and calculates numbers instead of enjoying himslef and relaxing. Points 70.35%
Sentence This is a very unhealthy way of live and could be damaging to the raising of his children. Points 70.30%
Sentence Grwoing up we are trained by our parents subcounciously. Points 83.51%
Sentence We have been trained at a very young age always to be busy. Points 76.97%
Sentence When were not working on deadline or have meeting to be at we are often wondering what do we do with ourselves. Points 74.89%
fact sentences were as follows:
Sentence The father at the head of the table asking everyone what their agenda is for the day. Points 61.44% Message This sentence is a statement of fact
Nouns: "head" "father" "table" "asking" "day" "everyone" "agenda" Subjects: "father " Google query: "father " "head" "father" "table" "asking" "day" "everyone" "agenda" site:en.wikipedia.org Looking at page: Talk:George W. Bush 2-gram "what their" : 0 3-gram "is for the" : 0 3-gram "everyone what their" : 0 2-gram "is for" : 3 1-gram "is" : 765 3-gram "agenda is for" : 0 2-gram "their agenda" : 0 2-gram "agenda is" : 0 3-gram "what their agenda" : 0 1-gram "their" : 26 3-gram "their agenda is" : 0 Measure: 8.97019694918424 Min: -15.125 Max: 24.0951969491842 Final score: 61.4356857524281 I can say this is a fact with 61.44 percent confidence
Sentence Five minutes later everyone meets at the family vehicle and files in. Points 0.00% Message This sentence is a statement of fact
Nouns: "files" "minutes" "vehicle" "everyone" "family" Subjects: "everyone " Google query: "everyone " "files" "minutes" "vehicle" "everyone" "family" site:en.wikipedia.org Looking at page: OJ Simpson 1-gram "five" : 0 3-gram "minutes later everyone" : 0 3-gram "five minutes later" : 0 1-gram "later" : 0 2-gram "five minutes" : 0 3-gram "later everyone meets" : 0 2-gram "minutes later" : 0 1-gram "meets" : 0 2-gram "meets at" : 0 2-gram "later everyone" : 0 2-gram "everyone meets" : 0 3-gram "meets at the" : 0 3-gram "everyone meets at" : 0 Measure: -24 Min: -24 Max: 0 Final score: 0 I can say this is a fact with 0.00 percent confidence
Sentence The car speeds away and everyone is off to their busy filled day. Points 41.31% Message This sentence is a statement of fact
Nouns: "car" "day" "everyone" "speeds" Subjects: """everyone " Google query: """everyone " "car" "day" "everyone" "speeds" site:en.wikipedia.org Looking at page: User talk:Arpingstone 2-gram "everyone is" : 0 3-gram "everyone is off" : 0 3-gram "their busy filled" : 0 3-gram "and everyone is" : 0 1-gram "is" : 229 2-gram "speeds away" : 0 2-gram "away and" : 0 3-gram "to their busy" : 0 3-gram "busy filled day" : 0 2-gram "filled day" : 0 3-gram "off to their" : 0 3-gram "is off to" : 0 2-gram "is off" : 0 2-gram "busy filled" : 0 3-gram "speeds away and" : 0 2-gram "their busy" : 0 1-gram "their" : 14 2-gram "to their" : 2 3-gram "car speeds away" : 0 1-gram "away" : 0 3-gram "away and everyone" : 0 1-gram "busy" : 3 1-gram "filled" : 0 Measure: -10.7581034907267 Min: -36.3157894736842 Max: 25.5576859829575 Final score: 41.306368834683 I can say this is a fact with 41.31 percent confidence
Sentence "No time for rest" is his motto. Points 62.20% Message This sentence is a statement of fact
Nouns: "rest" "time" "motto" Subjects: "" Google query: "" "rest" "time" "motto" site:en.wikipedia.org Looking at page: Samson Raphael Hirsch 3-gram "is his motto" : 0 1-gram "no" : 13 2-gram "is his" : 1 1-gram "is" : 178 2-gram "rest is" : 0 3-gram "rest is his" : 0 3-gram "for rest is" : 0 3-gram "no time for" : 0 2-gram "no time" : 0 Measure: 7.74673290775362 Min: -12 Max: 19.7467329077536 Final score: 62.2008348548231 I can say this is a fact with 62.20 percent confidence
Sentence The children pick up patterns at a very young age. Points 27.00% Message This sentence is a statement of fact
Nouns: "patterns" "children" "age" Subjects: "" Google query: "" "patterns" "children" "age" site:en.wikipedia.org Looking at page: Language acquisition 2-gram "a very" : 0 1-gram "young" : 3 2-gram "children pick" : 0 3-gram "a very young" : 0 3-gram "very young age" : 0 2-gram "very young" : 0 3-gram "the children pick" : 0 2-gram "young age" : 0 1-gram "very" : 3 3-gram "at a very" : 0 1-gram "pick" : 0 2-gram "pick up" : 0 3-gram "children pick up" : 0 3-gram "pick up patterns" : 0 Measure: -13.9694420893304 Min: -22.1666666666667 Max: 8.19722457733622 Final score: 26.9966207936384 I can say this is a fact with 27.00 percent confidence
ordinary sentences were as follows:
Sentence Suddenly he looks at his watch, then with a frantic look on his face, he lets out a bellowing roar of I'm late. Message This sentence is a statement of fact
Nouns: "face" "roar" "watch" "look" Subjects: "he """ Google query: "he """ "face" "roar" "watch" "look" site:en.wikipedia.org System cannot verify this using the existing knowledge source.
Sentence Every one looks at each other and scrambles to get thier belongings for the day. Message This sentence is a statement of fact
Nouns: "scrambles" "one" "thier" "belongings" "day" Subjects: "scrambles """ Google query: "scrambles """ "scrambles" "one" "thier" "belongings" "day" site:en.wikipedia.org System cannot verify this using the existing knowledge source.
Sentence you would think that with today's technology, the family would be able to sit down together and enjoy breakfast without being rushed, but in todays society this is not the case. Message This sentence is in future tense
Sentence We are trained as children to work as hard as we can, to advance ourseveles in careers or growth and any relaxation could be viewed as laziness by out parents or peers. Message This sentence is in future tense
Sentence Though we do have the technology which could enable us to live stress free lives, we choose to use it to our benefit, but instead of taking advantage of our newly created "spare time", we bog ourselves with more work. Message This sentence has comparitive/superlative adverbs/adjectives
Sentence When his boss says we're going to give you a half day today, he decides to spend it on the golf course discussin work. Message This sentence is in future tense
Sentence He has no time for his family and always seems to be found in his office when at home. Message This sentence contains the value word home
Sentence These children from a very young age are taught that leisure time is wrong. Message This sentence contains the value word leisure
Sentence At a young age that children are subjected to little league and ballet, as a detourant of cutting into their parents time. Message This sentence contains the value word little
Sentence In these activities childrn are pushed to their fullest potential, allowing them to accompish the honor roll, class president, or even valedictorian for there graduating class. Message This sentence has comparitive/superlative adverbs/adjectives
Sentence It is great that the children have such drive, but without relaxation or leisure time it oculd lead to psychological problems or mental breakdowns. Message This sentence contains the value word great
Sentence Even though technology has created free or leisure time, we as individuals need to learn to take advantage of it. Message This sentence contains the value word leisure
Sentence The fact of the matter is that we do have the technology to make our lives a lot easier, we just need to take advantage of it, if we don't we could end up seriously injured physically, or even more detrminetal psychologically. Message This sentence has comparitive/superlative adverbs/adjectives
For a third essay, the EGAL system has graded the system to be a 4.0 pointer, where the essay was graded to be a 5.0 pointer.
The topic of the essay was :
In some countries, television and radio programs are carefully censored for offensive language and behavior. In other countries, there is little or no censorship. In your view, to what extent should government or any other group be able to censor television or radio programs? Explain, giving relevant reasons and/or examples to support your position.
The user response was :
I beg to differ with the speaker's contention which seems to imply that the goal of technology is not only to increase effciency but also our leisure time. Also interwoven in the speaker's statement is the fallacious assumption that they are connected. So we have three points which need to be considered - technological advances, efficiency & leisure - and how they are related. The aim of technological advance (progress in applied sciences), as far as I know, is to apply scientific data and discoveries toward practical and beneficial use. For instance we've used new knowledge of Particle Physics in diagnosing medical conditions - eg. through Magneto Resonance Imagery - and also in treatment - eg., radiotherapy. Did this technological advance and the motivation behind it really have anything to do with efficiency? Only in that efficiency might be a by-product of a certain technology , but I do not think it was the primary objective. Of course the by-product of certain new technologies might be "efficiency" but to what extent? Computers are typically cited as a perfect example. Yes they do help us get more work done without expending as much energy. But we need to factor in the time and energy required in learning how to efficiently operate one, and then expended in keeping our learning up to date with the rapid technological advances in the same. (A person with the energy to compile and critically analyze the data constructively to formulate the answer to that one will definitely need an advanced computer!) So its possible that even computers don't in the end improve the efficiency of our daily lives, in net terms. And then, there is the question of "leisure". Personally I think it is a matter of choice and not time saving ,technologically advanced, efficient tools. The speaker seems to assume that the time "saved" (we are still waiting for the verdict on that one) will be spent towards leisure. I do not see the connection. Ulitmately the motivation of a person, personality & lifestyle choices and circumstances determine how the time that is saved is used. It could be towards leisure in one person's case; in another's towards putting in more hours to make more money to make ends meet or to buy that new car which he/she absolutely must have. In the end I think there is no clear connection between the three points under consideration. Hence in the absence of the relationship between technology, efficiency & leisure claimed by the speaker I disagree on whole. Percentage of Gibberish Sentences were: 13.64 Percentage of Irrelevant Sentences were: 40.91 Percentage of Fact Sentences were: 0.00 Percentage of Ordinary Sentences were: 45.45 No. of Gibberish Sentences were: 3 No. of Irrelvant Sentences were: 9 No. of Facts Sentences were: 0 No. of Ordinary Sentences were: 10 gibberish sentences were as follows Sentence I beg to differ with the speaker's contention which seems to simply that the goal of technology is not only to increase effciency but also our leisure time. Points 90.07% Semantic gibberish Sentence Computers are typically cited as a perfect example. Points 94.87% Semantic gibberish Sentence So its possible that even computers don't in the end improve the efficiency of our daily lives, in net terms. Points 90.68% Semantic gibberish irrelevant sentences were as follows: Sentence Also interwoven in the speaker's statement is the fallacious assumption that they are connected. Points 71.23% Sentence So we have three points which need to be considered - technological advances, efficiency & leisure - and how they are related. Points 75.51% Sentence For instance we've used new knowledge of Particle Physics in diagnosing medical conditions - eg. Points 70.60% Sentence Did this technological advance and the motivation behind it really have anything to do with efficiency? Points 73.89% Sentence But we need to factor in the time and energy required in learning how to efficiently operate one, and then expended in keeping our learning up to date with the rapid technological advances in the same. Points 76.65% Sentence And then, there is the question of "leisure". Points 76.20% Sentence The speaker seems to assume that the time "saved" (we are still waiting for the verdict on that one) will be spent towards leisure. Points 70.48% Sentence I do not see the connection. Points 76.59% Sentence In the end I think there is no clear connection between the three points under consideration. Points 73.67% no fact sentences were found ordinary sentences were as follows: Sentence The aim of technological advance (progress in applied sciences), as far as I know, is to apply scientific data and discoveries toward practical and beneficial use. Message This sentence contains the value word progress Sentence through Magneto Resonance Imagery - and also in treatment - eg., radiotherapy. Message This sentence is a statement of fact Nouns: "magneto" Subjects: "Magneto Resonance Imagery " Google query: "Magneto Resonance Imagery " "magneto" site:en.wikipedia.org System cannot verify this using the existing knowledge source. Sentence Only in that efficiency might be a by-product of a certain technology , but I do not think it was the primary objective. Message This sentence is in future tense Sentence Of course the by-product of certain new technologies might be"efficiency" but to what extent? Message This sentence is in future tense Sentence Yes they do help us get more work done without expending as much energy. Message This sentence has comparitive/superlative adverbs/adjectives Sentence (A person with the energy to compile and critically analyze the data constructively to formulate the answer to that one will definitely need an advanced computer!) Message This sentence is in future tense Sentence Personally I think it is a matter of choice and not time saving ,technologically advanced, efficient tools. Message This sentence contains the value word choice Sentence Ulitmately the motivation of a person, personality & lifestyle choices and circumstances determine how the time that is saved is used. Message This sentence contains the value word motivation Sentence It could be towards leisure in one person's case; in another's towards putting in more hours to make more money to make ends meet or to buy that new car which he/she absolutely must have. Message This sentence has comparitive/superlative adverbs/adjectives Sentence Hence in the absence of the relationship between technology, efficiency & leisure claimed by the speaker I disagree on whole. Message This sentence contains the value word efficiency
The fourth essay is a combination of two 3-point essays written by our co-students for a prompt given commonly in class. The EGAL system graded this grouped essay to be a 3.62
The topic was | |
Automated essay scoring is unfair to students, since there are many | |
different ways for a student to express ideas intelligently and | |
coherently. A computer program can not be expected to anticipate all | |
of these possibilities, and will therefore grade students more harshly | |
than they deserve. Discuss whether you agree or disagree (partially | |
or totally) with the view expressed providing reasons and examples. | |
The student response was
Many students write essays for exam, some write good, some bad, some worst. The compter cannot get this idea of good, bad, and worst. It just tries to find information the ways it is supposed to and scores that way. Thts why automated essay scoring is unfair to students. For examples, many student know that the computer checks the essays, therefore they learn how to ebat the machine rather then being creative to write their essays and learn something. In this case the computer would just be too good to score and give good score to someone who cant even write a creative sentence nor can write grammatical correct sentece. Another point here we can see is that the computer learns from essays written by students, now if the sample essays are not that creative then if someone writes creative essay it will just get confused and give some random score, if not then it will give the best possible score which is still not correct. So is it fair to get garbage or grate score from machine? Last but not least, computer scoring ethically is not good. To conclude I would say stop this computer scoring, it doesnt make sense when you write an essay thinking hard and someobody who is not real grades it for you and gives you something that you dont confidence about. Automated essay scoring is used in GMAT and GRE. It uses computers to grade student essays. Using computers makes work faster. It also reduces the work load on human graders. First of all, the difference between human graded essay and computer graded essay is not very significant. Both tend to be the same. If a good computer is used this difference can be reduced. Even though each student has his own views the computer is intelligent enough to catch the differences and grade accordingly. As a student writes an essay the computer can understand how the student writes and this will help it in grading. Thus, the computer does not grade an essay harshly than humans. It just has some difference but not to a greater extent and such deviations can be ignored as even humans make mistakes. Finally, I feel that the usage of automated essay scoring is good and should be followed by all including schools and universities. It will help students and teachers a lot.
-------------------------------------------------- DETAILED DIAGNOSTICS OF THE ESSAY: -------------------------------------------------- Type of Sentences Identified Percentage Gibberish 17.39 Irrelevance 17.39 Facts 26.09 Ordinary 39.13 Percentage of Gibberish Sentences were: 17.39 Percentage of Irrelevant Sentences were: 17.39 Percentage of Fact Sentences were: 26.09 Percentage of Ordinary Sentences were: 39.13 No. of Gibberish Sentences were: 4 No. of Irrelvant Sentences were: 4 No. of Facts Sentences were: 6 No. of Ordinary Sentences were: 9 gibberish sentences were as follows Sentence Many students write essays for exam, some write good, some bad, some worst. Points 91.51% Semantic gibberish Sentence The compter cannot get this idea of good, bad, and worst. Points 63.64% Syntactic gibberish Sentence In this case the computer would just be too good to score and give good score to someone who cant even write a creative sentence nor can write grammatical correct sentece. Points 92.21 Semantic gibberish Sentence Using computers makes work faster. Points 94.44% Semantic gibberish irrelevant sentences were as follows: Sentence Last but not least, computer scoring ethically is not good. Points 75.51% Sentence Automated essay scoring is used in GMAT and GRE. Points 72.83% Sentence If a good computer is used this difference can be reduced. Points 70.31% Sentence It just has some difference but not to a greater extent and such deviations can be ignored as even humans make mistakes. Points 75.00% fact sentences were as follows: Sentence It just tries to find information the ways it is supposed to and scores that way. Points 40.23% I can say this is a fact with 40.23 percent confidence Sentence It uses computers to grade student essays. Points 0.00% I can say this is a fact with 0.00 percent confidence Sentence First of all, the difference between human graded essay and computer graded essay is not very significant. Points 22.31% I can say this is a fact with 22.31 percent confidence Sentence Both tend to be the same. Points 14.48% I can say this is a fact with 14.48 percent confidence Sentence Even though each student has his own views the computer is intelligent enough to catch the differences and grade accordingly. Points 37.05% I can say this is a fact with 37.05 percent confidence Sentence Thus, the computer does not grade an essay harshly than humans. Points 13.05% I can say this is a fact with 13.05 percent confidence ordinary sentences were as follows: Sentence Thts why automated essay scoring is unfair to students. Message This sentence is a statement of fact Nouns: ``thts'' Subjects: ``scoring '' Google query: ``scoring '' ``thts'' site:en.wikipedia.org System cannot verify this using the existing knowledge source. Sentence For examples, many student know that the computer checks the essays, therefore they learn how to ebat the machine rather then being creative to write their essays and learn something. Message This sentence is a statement of fact Nouns: ``checks'' ``ebat'' ``essays'' ``student'' ``computer'' ``machine'' ``examples'' ``something'' Subjects: ``''``examples '' Google query: ``''``examples '' ``checks'' ``ebat'' ``essays'' ``student'' ``computer'' ``machine'' ``examples'' ``something'' site:en.wikipedia.org System cannot verify this using the existing knowledge source. Sentence Another point here we can see is that the computer learns from essays written by students, now if the sample essays are not that creative then if someone writes creative essay it will just get confused and give some random score, if not then it will give the best possible score which is still not correct. Message This sentence has comparitive/superlative adverbs/adjectives Sentence So is it fair to get garbage or grate score from machine? Message This sentence is a statement of fact Nouns: ``score'' ``grate'' ``garbage'' ``machine'' Subjects: Google query: ``score'' ``grate'' ``garbage'' ``machine'' site:en.wikipedia.org System cannot verify this using the existing knowledge source. Sentence To conclude I would say stop this computer scoring, it doesnt make sense when you write an essay thinking hard and someobody who is not real grades it for you and gives you something that you dont confidence about. Message This sentence is in future tense Sentence It also reduces the work load on human graders. Message This sentence contains the value word work Sentence As a student writes an essay the computer can understand how the student writes and this will help it in grading. Message This sentence is in future tense Sentence Finally, I feel that the usage of automated essay scoring is good and should be followed by all including schools and universities. Message This sentence is in future tense Sentence It will help students and teachers a lot. Message This sentence is in future tense
The fifth essay sample that we are looking at is the topic/prompt given as the response for a topic/prompt. The EGAL system has evaluated such a tricky response to a score of 1.96
The topic/prompt was
People often complain that products are not made to last. They feel that making products that wear out fairly quickly wastes both natural and human resources. What they fail to see, however, is that such manufacturing practices keep costs down for the consumer and stimulate demand. Which do you find more compelling: the complaint about products that do not last or the response to it? Explain your position using relevant reasons and/or examples drawn from your own experience, observations, or reading.
The response was entered to be
People often complain that products are not made to last. They feel that making products that wear out fairly quickly wastes both natural and human resources. What they fail to see, however, is that such manufacturing practices keep costs down for the consumer and stimulate demand. Which do you find more compelling: the complaint about products that do not last or the response to it? Explain your position using relevant reasons and/or examples drawn from your own experience, observations, or reading.
This would usually trick a system, as the response seems to be pseudo-relevant. But here is the analysis of how EGAL handles the same-
-------------------------------------------------- DETAILED DIAGNOSTICS OF THE ESSAY: --------------------------------------------------
Type of Sentences Identified Percentage Gibberish 40.00 Irrelevance 20.00 Facts 20.00 Ordinary 20.00 Percentage of Gibberish Sentences were: 40.00 Percentage of Irrelevant Sentences were: 20.00 Percentage of Fact Sentences were: 20.00 Percentage of Ordinary Sentences were: 20.00 No. of Gibberish Sentences were: 2 No. of Irrelvant Sentences were: 1 No. of Facts Sentences were: 1 No. of Ordinary Sentences were: 1 gibberish sentences were as follows Sentence People often complain that products are not made to last. Points 92.59% Semantic gibberish Sentence They feel that making products that wear out fairly quickly wastes both natural and human resources. Points 90.64% Semantic gibberish irrelevant sentences were as follows: Sentence Explain your position using relevant reasons and/or examples drawn from your own experience, observations, or reading. Points 73.18% fact sentences were as follows: Sentence What they fail to see, however, is that such manufacturing practices keep costs down for the consumer and stimulate demand. Points 36.74% I can say this is a fact with 36.74 percent confidence ordinary sentences were as follows: Sentence Which do you find more compelling: the complaint about products that do not last or the response to it? Message This sentence has comparitive/superlative adverbs/adjectives
Now for the sixth essay, we considered a 6 point esssay which was graded as 4.99 on our EGAL system. The essay topic was :
Automated essay scoring is unfair to students, since there are many different ways for a student to express ideas intelligently and coherently. A computer program can not be expected to anticipate all of these possibilities, and will therefore grade students more harshly than they deserve. Discuss whether you agree or disagree (partially or totally) with the view expressed providing reasons and examples.
The user response was:
I strongly disagree with the argument that automated essay scoring is unfair to students. The automated essay scoring systems in use are carefully designed by natural language processing experts. In fact, they are proven to be comparable, if not better, to a human grader. Automated essay scoring systems might grasp the nuances of every witty writing, but it certainly does well, the task for which it is assigned� namely grading of analytical writing essays. Argument claims that a student can express an idea in ways not known to the automated system thus resulting in a poor score. To refute this claim I must point out the fact that there are many NLP techniques which look at the general characterisitcs of a good essay, rather than a particular way, to decide a score. Even the most ingeniously different essay follows the guidelines of a good essay, otherwise it will not be able to represent the idea coherently. Second, the issue of harshness of the automated system is really irrelevant. However harsh a system might be, as long as it is the common denominator for all the essays, the relative scores are still the same. Therefore there is no unfair harshness here. In any case, automated grading is proven to be as harsh as a human grader. Third, we must not ignore the benefits of the automated essay scoring. It is cost effective� it is half as expensive as a human grader. Like all machine-based approach, it does not suffer from errors due to fatigue and mental state. There is no chance that the system is biased towards any particular student. The The speed of such a system will only increse as more processing power is added and new techniques are developed. In conclusion, I would like to say that automated essay scoring are a very fair way of scoring, if implemented correctly. However, there is, and always should be, atleast one human grader in the scoring process to take care of the anamolies that might arise in certain rare cases.
-------------------------------------------------- DETAILED DIAGNOSTICS OF THE ESSAY: --------------------------------------------------
Type of Sentences Identified Percentage Gibberish 11.11 Irrelevance 11.11 Facts 16.67 Ordinary 61.11 Percentage of Gibberish Sentences were: 11.11 Percentage of Irrelevant Sentences were: 11.11 Percentage of Fact Sentences were: 16.67 Percentage of Ordinary Sentences were: 61.11 No. of Gibberish Sentences were: 2 No. of Irrelvant Sentences were: 2 No. of Facts Sentences were: 3 No. of Ordinary Sentences were: 11 gibberish sentences were as follows Sentence Third, we must not ignore the benefits of the automated essay scoring. Points 92.86% Semantic gibberish Sentence There is no chance that the system is biased towards any particular student. Points 92.59% Semantic gibberish irrelevant sentences were as follows: Sentence Therefore there is no unfair harshness here. Points 82.83% Sentence In any case, automated grading is proven to be as harsh as a human grader. Points 79.08% fact sentences were as follows: Sentence I strongly disagree with the argument that automated essay scoring is unfair to students. Points 37.19% I can say this is a fact with 37.19 percent confidence Sentence The automated essay scoring systems in use are carefully designed by natural language processing experts. Points 40.31% I can say this is a fact with 40.31 percent confidence Sentence In fact, they are proven to be comparable, if not better, to a human grader. Points 49.13% I can say this is a fact with 49.13 percent confidence ordinary sentences were as follows: Sentence Automated essay scoring systems might grasp the nuances of every witty writing, but it certainly does well, the task for which it is assigned� namely grading of analytical writing essays. Message This sentence is in future tense Sentence Argument claims that a student can express an idea in ways not known to the automated system thus resulting in a poor score. Message This sentence is in future tense Sentence To refute this claim I must point out the fact that there are many NLP techniques which look at the general characterisitcs of a good essay, rather than a particular way, to decide a score. Message This sentence is in future tense Sentence Even the most ingeniously different essay follows the guidelines of a good essay, otherwise it will not be able to represent the idea coherently. Message This sentence has comparitive/superlative adverbs/adjectives Sentence Second, the issue of harshness of the automated system is really irrelevant. Message This sentence contains the value word harshness Sentence However harsh a system might be, as long as it is the common denominator for all the essays, the relative scores are still the same. Message This sentence is in future tense Sentence It is cost effective� it is half as expensive as a human grader. Message This sentence contains the value word expensive Sentence Like all machine-based approach, it does not suffer from errors due to fatigue and mental state. Message This sentence contains the value word fatigue Sentence The The speed of such a system will only increse as more processing power is added and new techniques are developed. Message This sentence has comparitive/superlative adverbs/adjectives Sentence In conclusion, I would like to say that automated essay scoring are a very fair way of scoring, if implemented correctly. Message This sentence is in future tense Sentence However, there is, and always should be, atleast one human grader in the scoring process to take care of the anamolies that might arise in certain rare cases. Message This sentence is in future tense
Now for the seventh essay, we considered the same 6-point essay for the trvious topic, and evaluated it for a different topic. The EGAL system graded the essay to be a 2.99 score essay.
The topic was :
In some countries, television and radio programs are carefully censored for offensive language and behavior. In other countries, there is little or no censorship. In your view, to what extent should government or any other group be able to censor television or radio programs? Explain, giving relevant reasons and/or examples to support your position.
The response was the same as the one for the last one.
-------------------------------------------------- DETAILED DIAGNOSTICS OF THE ESSAY: --------------------------------------------------
Type of Sentences Identified Percentage Gibberish 11.11 Irrelevance 38.89 Facts 5.56 Ordinary 44.44 Percentage of Gibberish Sentences were: 11.11 Percentage of Irrelevant Sentences were: 38.89 Percentage of Fact Sentences were: 5.56 Percentage of Ordinary Sentences were: 44.44 No. of Gibberish Sentences were: 2 No. of Irrelvant Sentences were: 7 No. of Facts Sentences were: 1 No. of Ordinary Sentences were: 8 gibberish sentences were as follows Sentence Third, we must not ignore the benefits of the automated essay scoring. Points 92.86% Semantic gibberish Sentence There is no chance that the system is biased towards any particular student. Points 92.59% Semantic gibberish irrelevant sentences were as follows: Sentence I strongly disagree with the argument that automated essay scoring is unfair to students. Points 70.55% Sentence In fact, they are proven to be comparable, if not better, to a human grader. Points 74.49% Sentence Second, the issue of harshness of the automated system is really irrelevant. Points 74.37% Sentence Therefore there is no unfair harshness here. Points 84.85% Sentence In any case, automated grading is proven to be as harsh as a human grader. Points 72.75% Sentence It is cost effective� it is half as expensive as a human grader. Points 78.86% Sentence In conclusion, I would like to say that automated essay scoring are a very fair way of scoring, if implemented correctly. Points 73.77% fact sentences were as follows: Sentence The automated essay scoring systems in use are carefully designed by natural language processing experts. Points 40.31% I can say this is a fact with 40.31 percent confidence ordinary sentences were as follows: Sentence Automated essay scoring systems might grasp the nuances of every witty writing, but it certainly does well, the task for which it is assigned� namely grading of analytical writing essays. Message This sentence is in future tense Sentence Argument claims that a student can express an idea in ways not known to the automated system thus resulting in a poor score. Message This sentence is in future tense Sentence To refute this claim I must point out the fact that there are many NLP techniques which look at the general characterisitcs of a good essay, rather than a particular way, to decide a score. Message This sentence is in future tense Sentence Even the most ingeniously different essay follows the guidelines of a good essay, otherwise it will not be able to represent the idea coherently. Message This sentence has comparitive/superlative adverbs/adjectives Sentence However harsh a system might be, as long as it is the common denominator for all the essays, the relative scores are still the same. Message This sentence is in future tense Sentence Like all machine-based approach, it does not suffer from errors due to fatigue and mental state. Message This sentence contains the value word fatigue Sentence The The speed of such a system will only increse as more processing power is added and new techniques are developed. Message This sentence has comparitive/superlative adverbs/adjectives Sentence However, there is, and always should be, atleast one human grader in the scoring process to take care of the anamolies that might arise in certain rare cases. Message This sentence is in future tense
There are some systems that we would like to acknowledge, since we used them as a part of our system. For Gibberish Detection, we used Link Grammar, a grammar parser[4]. This parser enables us to find unused links and unknown words in a sentence. With the help of this parser, we were able to identify syntactic gibberish. We have used WordNet::Similarity package, built on WordNet2.0[10] for finding the semantic similarity between two words. This is used in identifying semantic gibberish as well as relevance of a sentence to the topic. For identifying statements of fact, we used ideas from online resources [2], [6], [7] and [8]. Each of these references have helped us decide the basis on which to distinguish facts from opinions and also to identify the properties of a statement of fact. To check for the accuracy of these statements of facts, we use the idea of a fact repository from Static Knowledge Sources[9], which would be wikipedia.org[3] in our case. We access this online encyclopedia by constructing a Google query using Google API[11] and fire the query to retrieve wikitext for the first match found by Google in en.wikipedia.org.
[1] Page, E.B. , 1994, New Computer grading of student prose, using modern concepts and software, Journal of Experimental Education
[2] Identification of a Fact, http://et.sdsu.edu/saeria/671/facts/fact-identification.html
[3] Wikipedia, http://en.wikipedia.org/wiki/Main_Page
[4] Link Grammar, http://www.link.cs.cmu.edu/link/
[5] Peter W. Foltz, Darrell Laham, Thomas K. Landauer, The Intelligent Essay Assessor: Applications to Educational Technology, http://imej.wfu.edu/articles/1999/2/04/index.asp
[6] Langan, John , ``Ten Steps to Improving Reading Skills'', http://www.waycross.edu/ismt/fact.htm
[7] Human Value Project, http://www.uia.org/values/vztab12.htm, Union of International Associations 1997 - 2004
[8] Fact Indentification Strategies, http://et.sdsu.edu/saeria/671/facts/fact-identification.html
[9] The Static Knowledge Sources: Ontology, Fact Repository and Lexicons, http://ilit.umbc.edu/Book/sks.htm
[10] WordNet 2.0, ``A lexical database for the English language'', http://www.cogsci.princeton.edu/~wn/wn2.0.shtml
[11] Google API, http://www.google.com/apis/
[12] Valentini, Salvatore, Francesca Neri, and Alessandro Cucchiarelli. 2003. An Overview of Current Research on Automated Essay Grading. In JITE, Vol-2,2003
[13] Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., and Harshman R.A. , 1990, Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science
[14] Hearst, M., 2000, The debate on automated essay grading. IEEE Intelligent Systems
[15] Burstein, J., Kukich, K., Wolff, S., Chi, L., and Chodorow M., 1998, Enriching automated essay scoring using discourse marking. Proceedings of the Workshop on Discourse Relations and Discourse Marking, Annual Meeting of the Association of Computational Linguistics, Montreal, Canada.
[16] Burstein, J., Leacock, C., and Swartz, R. , 2001, Automated evaluation of essay and short answers. Proceedings of the Sixth International Computer Assisted Assessment Conference, Loughborough University, Loughborough, UK.
[17] Christie, J.R., 1999, Automated essay marking-for both style and content. Proceedings of the Third Annual Computer Assisted Assessment Conference, Loughborough University, Loughborough, UK
[18] Ming, P.Y., Mikhailov, A.A., and Kuan, T.L., 2000, Intelligent essay marking system. Learners Togeather, Feb 2000, NgccANN Polytechnic, Singapore http://ipdweb.np.edu.sg/lt/feb00/intelligent_essay_marking.pdf IEMS
[19] Mitchell, T., Russel, T., Broomhead, P., and Aldridge N.(2002) . Towards robust computerized marking of free-text responses. Proceedings of the Sixth International Computer Assisted Assessment Conference, Loughborough University, Loughborough, UK.
[20] Landauer, T.K., Foltz, P.W., and Laham D. , 1998, An introduction to latent semantic analysis. Discourse Processes, http://lsa.colorado.edu/pepers/dp1.LSAintro.pdf
[21] List of Value words, http://www.d.umn.edu/~dodd0036/stoplist.txt
[22] List of Fact words, http://www.d.umn.edu/~dodd0036/factlist.txt