Research :: IntelliMetric®
IntelliMetric® is founded on solid research and development spanning more than two decades. More than 350 research studies conducted both in-house and by third-party experts have determined that IntelliMetric® has levels of consistency, accuracy and reliability that meet, and more often exceed, those of human expert scorers. These studies measured the use of IntelliMetric® in a wide variety of content areas and for a variety of assessment purposes, and found that IntelliMetric®:- Agrees with or exceeds expert human scoring (On average, IntelliMetric® models agree within one point of expert raters 97% to 100% of the time; see the Journal of Technology, Learning and Assessment report.)
- Accurately scores responses across grade levels, subject areas and context (Comparisons of IntelliMetric® to expert scoring have been conducted with essays from fourth graders through adults in areas including English Language Arts, History, Social Science, Science, Business, and General Critical Thinking.)
- Shows a strong relationship to other measures of the same writing construct (See the WritePlacer Validity Study presented at AERA for more details.)
- Shows more reliable and more consistent results across samples than human expert scorers (See the IntelliMetric® Accuracy Summary Report for more information.)
White Papers and Research Summaries
Journal of Technology, Learning and Assessment, March 2006
Download the pdf!
A 2006 study published in the Journal of Technology, Learning and Assessment, a free online journal published by Boston College, states the "IntelliMetric® system is a consistent, reliable system for scoring AWA essays." (The AWA is the writing portion of the GMAT® assessment.) The study analyzes IntelliMetric® by first comparing its scores to individual human raters using "a Bayesian system employing simple word counts, and a weighted probability model using more than 750 responses to each of six prompts." The second evaluation was larger and compared "the IntelliMetric® system ratings to those of human raters using approximately 500 responses to each of 101 prompts."
The study found IntelliMetric® had "a perfect + adjacent agreement on 96% to 98% and 92% to 100% of instances in evaluations 1 and 2 respectively" and that "correlations of agreement between human raters and the IntelliMetric® system averaged .83 in both evaluations."
The study found IntelliMetric® had "a perfect + adjacent agreement on 96% to 98% and 92% to 100% of instances in evaluations 1 and 2 respectively" and that "correlations of agreement between human raters and the IntelliMetric® system averaged .83 in both evaluations."
WritePlacer Validity Study
Download the pdf!
Presented at the annual meeting of the American Educational Research Association, Division D, in Montreal, Canada in 2005, this paper reports the results of a three-year study of a direct writing assessment program incorporating automated essay scoring technology used for college placement. The study, conducted at more than 100 colleges and universities throughout the United States, examines the validity of the instrument using a multi-trait multi-method approach. Several hundred thousand students took the direct writing assessment and one or more other measures as a basis to examine the construct validity of the writing component.
This study examines the validity of the test scores for WritePlacer®, a direct assessment of writing included as one component of The College Board's ACCUPLACER® college placement testing program, and investigates the extent to which WritePlacer® scores relate to other measures of writing and other academic skill areas.
This study examines the validity of the test scores for WritePlacer®, a direct assessment of writing included as one component of The College Board's ACCUPLACER® college placement testing program, and investigates the extent to which WritePlacer® scores relate to other measures of writing and other academic skill areas.
A Comparison of the Accuracy of Bayesian and IntelliMetric® Automated Essay Scoring Methods to Score Essays Written in English
Download the pdf!
Presented at the 10th Annual National Roundtable Conference in Melbourne, Australia in 2005, this paper investigates the scoring quality of IntelliMetric® compared to Bayesian automated scoring methods. IntelliMetric® has been shown to be an effective tool for scoring essay-type, constructed response questions across K-12, higher education and professional training environments as well as within a variety of content areas and assessment purposes.
The Effects of Inclusion of Native Speakers' Writing Samples on the Domain Scoring Accuracy of Automated Essay Scoring of Writing Submitted by Taiwanese English Language Learners
Download the pdf!
Presented at the 32nd Annual Conference of the International Association for Educational Assessment in Singapore in May 2006, this paper presents findings regarding the effects of training set composition on the domain scoring accuracy of essays submitted by Taiwanese students scored by an automated essay scoring system. This study compares the accuracy of scoring the same set of essays written by Taiwanese students using two different models: one model using blended native and ELL essays and one using a set of entirely ELL essays. Read more.
A Comparison of the Accuracy of Automated Essay Scoring Using Prompt-Specific and Prompt-Independent Training
Download the pdf!
Presented at the Annual Meeting of the American Educational Research Association in San Francisco, California in April 2006, this paper evaluates the scoring accuracy of models created under the conditions of prompt-specific and prompt-independent training in order to investigate the efficacy of these two approaches.
IntelliMetric® Accuracy Summary Report
Download the pdf!
Summarizes data regarding the accuracy of IntelliMetric® scoring for upper elementary, middle school, and high school writing prompts. A comparison of IntelliMetric® scoring to expert human scoring is also examined.
Necessary Components for Effective Writing Instruction
Download the pdf!
Explores successful instructional strategies that can be used to improve the writing skills of students and to increase the amount and frequency of writing.
Effects of Mode of Delivery for Constructed Response and Selected Response Assessments
Download the pdf!
Examines whether the delivery mode of an assessment, specifically traditional paper and pencil versus computer-based, impacts on the results of the test. The two delivery modes are compared for both multiple-choice and constructed-response assessments as they relate to those who are taking the exams and who are judging (grading) the exams.
