IntelliMetric®: Frequently Asked Questions

What is IntelliMetric®?

IntelliMetric® is an intelligent scoring system that emulates the process carried out by human scorers. IntelliMetric® is theoretically grounded in a cognitive model often referred to as a “brain-based” or “mind-based” model of information processing and understanding. IntelliMetric® draws upon the traditions of Cognitive Processing, Artificial Intelligence, Natural Language Understanding, and Computational Linguistics in the process of evaluating written text.

How long has IntelliMetric® been used?

IntelliMetric® has been used to score essays operationally since 1998. There have been many improvements made to the scoring system over the years that have improved its scoring accuracy.

How does the IntelliMetric® artificial intelligence scoring engine work?

IntelliMetric® emulates the process carried out by human scorers. The system must be “trained” with a set of previously scored responses containing “known score” marker papers for each score point. These papers are used as a basis for the system to infer the rubric and the pooled judgments of the human scorers. The IntelliMetric® system “internalizes” the characteristics of the responses associated with each score point and applies this intelligence in subsequent scoring. The approach is consistent with the procedure underlying holistic scoring. IntelliMetric® creates a unique solution for each stimulus or prompt. This is conceptually similar to prompt-specific training for human scorers. For this reason, IntelliMetric® is able to achieve both high correlations with the scores of human readers and matching percentages with scores awarded by humans.

How do we know that IntelliMetric® works?

In order to evaluate whether IntelliMetric® is able to accurately score essays, we put IntelliMetric® to the same tests that we would an expert human rater. After a human rater is trained, he/she is asked to score a set of essays that have previously been scored by experts. The agreement rate between the new scorer and the “known” scores are compared. If the human rater meets the criteria for acceptable agreement, the human is allowed to score new essays. Similarly, after IntelliMetric® is trained, it is asked to score a set of essays that were previously scored by experts. Just as in the human scoring process, we look at the agreement between IntelliMetric® and the expert scores. If IntelliMetric® meets or exceeds the standards for a human rater, IntelliMetric® is able to be put into use to score new essays.

In short, IntelliMetric® is treated much like any expert scorer on the team when evaluating for consistency and accuracy. It must meet or exceed the same high benchmarks of quality that any human expert must meet.

How is IntelliMetric® trained?

Similar to how human raters are trained to score a new prompt, IntelliMetric® is given a training set that includes many essays that were previously scored by experts. IntelliMetric® takes that information and processes it to determine what it means to be an essay deserving of each score point as mandated by the experts. IntelliMetric® establishes a scoring program unique for each training set that best predicts what the experts would score new essays submitted to that same prompt. After the model is created, the IntelliMetric® scores are compared to expert scores on a validation set of essays. These essays have been scored by humans, but are not part of the training process. If IntelliMetric® and the experts agree, the IntelliMetric® model is ready to be put into use to score new essays submitted to that prompt.

How long does it take to train IntelliMetric®?

The actual IntelliMetric® model creation process is fast. It takes considerably more time to create the training set, which requires collecting an appropriate number of essays that are scored by human experts. After the model is created, time is also needed to carefully determine whether the model meets an accepted level of agreement.

Is IntelliMetric® similar to other automated essay scoring products available?

IntelliMetric® is a unique automated scoring engine. Its leverage of Artificial Intelligence, Natural Language Processing, and its close modeling of the human rating process make it distinctive from other scoring engines. Research has shown that the use of two or more raters provides a more accurate final score than the use of a single rater, and as such, IntelliMetric® was developed to include the equivalent of a panel of raters. Specifically, within IntelliMetric®, there are in essence multiple automated scoring systems at work, each using a different approach to scoring. With the resulting scores from each “judge,” a final IntelliMetric® score is provided.

Another key difference between IntelliMetric® and other solutions is in its inductive approach to learning how to score essays. Since IntelliMetric® is not rule based or driven upon a set list of features, the IntelliMetric® engine is able to score submissions that range from as short as one word all the way through to very long pieces of writing. IntelliMetric® does not require a special solution for short answers compared to long answers or persuasive essays compared to narrative essays. One engine can do it all!

The most important differentiation between IntelliMetric® and other automated essay scoring is the accuracy of the engine. IntelliMetric® provides unsurpassed essay scoring accuracy.

Does IntelliMetric® score the same as an expert rater?

While IntelliMetric® does not read an essay the same way an expert rater does, IntelliMetric® is able to score as accurately and often more accurately than a human rater. We are able to determine this by comparing how IntelliMetric® agrees with expert raters with how expert raters agree with each other. IntelliMetric® has consistently been found to agree with expert raters as often as or more often than human experts agree with each other.

Can IntelliMetric® score on only one rubric?

Since IntelliMetric® is a learning engine that learns how to score based on a training set of previously scored essays, IntelliMetric® is able to score accurately across a variety of rubrics.

Can IntelliMetric® provide domain scores?

Yes. IntelliMetric® is able to score holistically as well as for particular domains of writing. Common domains of writing in which IntelliMetric® is used to score include: Focus, Organization, Development, Mechanics, Grammar, Voice, and Language Use.

Can IntelliMetric® score essays written in languages other than English?

Yes. IntelliMetric® can be used to score essays written in more than 20 languages, including Chinese.

How accurate is IntelliMetric®?

IntelliMetric® is as accurate and often more accurate than human expert scorers. How do we know this? One way educators evaluate the accuracy of scoring is to look at how often two experts who review a set of papers independently agree with each other on the scores that should be assigned. In most controlled situations using a 6-point scale, two experts will agree with each other within 1 point about 95% of the time. When we look at how often IntelliMetric® scores agree with either of those experts, we find that IntelliMetric® typically agrees with either expert about 97% to 99% of the time.

Can IntelliMetric® be tricked?

Yes. Just as an expert rater can sometimes be tricked, IntelliMetric® is also not a perfect system. Since we know IntelliMetric® can be tricked, we have controls in place to catch non-legitimate essays submitted for scoring. At Vantage Learning, we tend to be on the conservative side, flagging a considerable number of essays for expert review to be certain we catch the non-legitimate essays. For any prompt, approximately 5% of the most aberrant essays will be flagged for expert review. We have had great success in being able to identify aberrant essays such as those that are off-topic, off-task, lack proper development, are written in a language other than what was expected, contain bad syntax, copy the question, are inappropriate, or contain messages of harm.

Why use IntelliMetric®?

IntelliMetric® is a highly accurate scoring engine. It provides accuracy that is superior to that of other scoring engines currently available. Its combination of artificial intelligence, natural language processing, and its similarity in training to human expert raters allows IntelliMetric® to achieve an agreement rate that is often higher than the rate achieved between two human raters.

After IntelliMetric® has been trained to score responses written to a specific prompt, it can be used to successfully and consistently score essays. Unlike human scorers, which require considerable time to score each response, IntelliMetric® requires minimal time to score a large number of essays.

The ability to score both short-answer and extended response items using a multiple-scoring systems approach, provide holistic and domain scores on any rubric, detect non-legitimate responses, and score essays written in a variety of languages make IntelliMetric® the most comprehensive scoring engine available.