Evaluating the Evaluators
In the 1960’s, Paul Diederich and others conducted a study of how essays are graded. They managed to talk over 50 people, including teachers as well as professionals, into being graders. The essays were the product of high school students. Each grader had to anonymously grade a large number of essays, and categorize each on a scale of 1 (worst) to 9 (best).
Surprisingly (or perhaps not so surprisingly for those who have graded essays), every single essay received a wide range of disparate grades. One over third of the essays received every grade, from 1 to 9. All of the essays received at least five different grades.
But how could this be?
Diederich and team analyzed the results and found that each grader applied their own criteria when grading. While the experiment did not have graders try to explain their criteria, Diederich was able to find that most graders fell into one of five criteria categories. For example, some graders based grades largely on their perception of the quality of the essay writer’s ideas. Others based their grades on the writer’s mechanics, such as spelling and grammar. Some were mostly focused on organization. Other graders used even different criteria.
But what does it mean?
This is the hard question. Apparently, a student’s essay grades can vary widely based on who is doing the grading. This might be mitigated if the graders agree ahead of time on a “cut sheet” that assigns a number of points to various aspects of the essay. Either the cut sheet or a representation of it could be provided to the students so that they have a better idea how they will be graded and the weights the grader will assign to each criterion. Additionally, if the students and the grader work together for some time, the students will learn the grader’s “style” and adapt to it.
Perhaps this is the most important point. If the students are given enough examples of what the grader’s criteria are, their ability to adapt to that criteria will increase dramatically. If they are given fewer or no examples, the students will be more likely to view their grades as an arbitrary crap-shoot.
Source: Paul B. Diederich, Measuring Growth in English, National Council of Teachers of English, 1974 (this is the source of the facts, not my opinions).