Robert Harrell on Standards Based Assessment

Over the past ten years Robert has posted 1,076 times. Those of us who have been on the PLC for those ten years are 1,076 times richer for Robert’s sharing. Here is the latest from Robert, on grading theory:

Standards-Based Grading and “Power Grading” are something that Robert Marzano has popularized. Scott Benedict has done work with both of these concepts and has information on his website Teach for June. Look under the Articles tab.

Moving to Standards-Based Assessment requires some significant changes in thinking. Most students and teachers – and just about all grading programs – begin with the idea of either accumulating points for a grade or getting a percentage correct. The “standard” percentages are in increments of 10 until you get to F, and then it occupies 59% of the scale.

SBA looks at the matter differently and assigns numeric values to performance relative to a standard. Ben’s rubric uses a four-point scale. I use a five-point scale. When I explain it to my students, I compare it to the state tests with which they are familiar. Here’s my scale and a short explanation:
5 = Exceeds the standard (in at least some aspects while meeting the standard in all aspects)
4 = Meets the standard (in all aspects)
3 = Approaches the standard (also called “Basic”, i.e. may partially meet the standard but has areas that do not meet the standard or comes close in all areas)
2 = Falls below the standard (does not meet or approach the standard)
1 = Falls far below the standard (e.g. gets every question wrong; turns in a paper with nothing but a name on it; physical presence in class will get a student a 1)
0 = Presents zero material for evaluation (e.g. does not come to class; does not even hand in a paper or quiz)

The the teacher’s problem is how to translate this to percentages. If I take the “standard” percentages, it utterly skews the results against students. On a performance, I may evaluate the student at “basic”, but the grading scale rates 3 out of 5 as 60%, a D minus. That doesn’t reflect the student’s ability at all. Even the 4 out of 5 is 80%, a B minus. B minus indicates lower proficiency than a 4 does on the scale.

The correlation to percentages is even more out of whack with the four-point scale.

The real problem is with the percentage scale commonly used in schools. If you study the history of grades and grading in the United States, this becomes quite clear. Grades were introduced at Yale University about 1795, and a four-point scale was used. (Possibly the origin of the 4.0 scale in the US) Until the middle of the twentieth century, standard scales ranged between three and nine divisions with the most common being three: superior, adequate, poor (or some other set of names that reflect this division). Until the second half of the nineteenth century, even these scales were not in widespread use. “Narrative grades” were given instead, in which the instructor described the students’ abilities and proficiencies.

The desire to have a system that was “transferable” between institutions led to the introduction of letter grades, and the system of A-B-C-D-F was adopted. E existed at one time, but it was dropped because of fear that it would be interpreted as “Excellent”. During the first half of the nineteenth century, the 100-point scale was popularized. With the widespread use of computers and computer programs in the 1950s this scale became dominant. It was introduced and disseminated with no pedagogical justification whatsoever. It was placed into computers because it is an easy scale for computer programmers to manipulate and program. Schools and universities adopted the 100-point scale because it was readily available, not because it was pedagogically sound. In fact, it is highly inaccurate.

Unless every quiz and test consists of 100 questions, the margin for error in accurate grading increases dramatically. On a ten-question quiz, the margin for error in reflecting what students actually know is as great as two letter grades. The scale also introduces a false perception of objectivity and precision. Do we really believe that a grader can distinguish between, say, a work that deserves an 89% and a work that deserves a 90%? Not unless you have a test of 100 discrete items, each worth one point. How many teachers do that on a regular basis? Not many. Instead, teachers generally work in increments of five or 10 points, thus introducing both imprecision and subjectivity. Those in themselves aren’t necessarily bad, but when they are disguised as precision and objectivity, they are dishonest and ethically questionable. A three or five-point scale is actually more accurate in categorizing students than a 100-point scale. It would be much more accurate for teachers simply to enter A, B, C, D or F in their grade books rather than a percentage or number of points. BTW, the A, B, C, D, F scale was introduced by Mt Holyoke College in 1897; in 1898 they briefly experimented with A, B, C, D, E, F.

The other problem with “standard” grading is the 60% for a D. Can we truly distinguish 60 degrees of failure? (I’m counting zero as a degree of failure.) Is failure truly so much more important than success that we weight the grading system toward failure? In what other endeavor do we do this? Baseball players who bat 300 (out of 1000) are stars. What percentage of shots does a star basketball or soccer player make? Except for grading in schools, 50% is average. Perhaps the skewing of our current system arises from the fact that in early applications of the 100-point scale, no one received lower than 50%, so the bottom of the scale was omitted. Thus, modern adoption of the “innovation” of giving an F 50% is really simply a return to earlier practice.

There is a great deal more that can be said about grades and grading, but I hope this helps.

In my own practice, I simply change the scale in the grade book. (Our program allows me to do this.)
100% = A+
99.99 – 80.01 = A
80.00 – 60.01 = B
60 – 40.01 = C
40 – 20.01 = D
20 – 0 = F
So far no one has complained.