Inspired by having to turn in midterm grades.

“I would teach for free; you have to pay me to give grades.” — Me, not true, but it feels that way at times.

stock image of old report card, with many columns and numbers that make little sense.
An 82.7 percent in Latin Composition, not at all suspiciously precise.

Someone new to the North American system might be shocked at the amount of grading. Nearly every course consists of many graded activities, which are aggregated to present a final assessment. This final grade is further aggregated with other classes, leading to an average, a GPA, which in turn serves to rank students for a variety of rewards and punishments.

Many teachers have expressed their dislike of grades or “grade-grubbing” students. I have some sympathy. Grades haven’t always been part of education and the peculiar tripartite US system is relatively recent: we have letter grades A,B,C,D,F (E used to exist, but is now gone), we have linked percentages and a number system 1–4, used to calculate GPAs.

Conflict of interest statement: my undergraduate institution did not give grades. Instead, we had narrative evaluations. In graduate school, no one took grades seriously (I have no idea what my GPA was). My experience with grading in college comes entirely from having to give them. Perhaps this explains what grading always seemed a bit strange to me. Why do we do it?

We grade, I would say, for three reasons: (1) to motivate and (2) to give feedback. Grades are also designed (3) to track and rank students against each other. Grades are terrible at all three.

We grade this way because it is relatively easy, efficient and can be done for many students. My former university replaced narrative evaluations with standard grades after I left. It may be terrible at all three goals, but it is comparatively efficient and gives the appearance of objectivity.

Are grades really terrible? Let’s see if I can support this claim. Starting with function (1), motivation. The typical complaint: grades only supply an imperfect extrinsic motivation, whereas intrinsic motivation is superior. This view has some truth, but it does not go far enough. The real issue is that grades perform function (1) and (3) simultaneously, creating perverse incentives.

A thought experiment: give your students the following choice: (a) receive an A without taking the course, or (b) take the course for a grade. The only chance they have of learning would be choice (b), but I suspect some (many?) would choose (a), preferring not to risk the lower grade. More to the point, choice (a) is rational; function (3) insures that their grade matters, often quite a lot. This situation is not ideal for learning.

We do not, of course, allow students this choice. Don’t try it: you would likely be fired. The thought experiment points to the fact that it is not some terrible moral failure for students to focus on their final grade; rather, it is a rational response to our system. So long as final grades continue to serve function (3) it is impossible to remove the perverse incentives in favor of extrinsic motivations. But, again, you can’t. The educational system requires (3) above all else. Where would we be if we couldn’t rank students against each other?

Feedback (2) seems straightforward. A letter grade or a percentage provides abysmally uninformative feedback. We often supplement the grade with better and richer narrative feedback and comments. The grade, in fact, often stands in the way of more meaningful feedback. It can also create perverse incentives for teachers to construct assessments that lend themselves to a grade. Some of us (me) are guilty of designing assessments precisely because they fit with the logic of percentages. Without the requirement to give a grade, what would your feedback look like? Moreover, the grade often obscures and misdirects feedback, making it harder for students to improve and us to help them.

Function (3) is, in my view, the weirdest and most troublesome. It is also the most rigid and ruthlessly enforced of the grading functions. Whatever academic freedom you have to organize assessments, you are not free from providing a final grade. All students must, almost without exception, receive a final grade. Yet, the final grade itself has very little pedagogical justification. The final grade is, let’s be honest, not really for the student; it is to provide a means to rank students.

The real problem, though, is that grades are terrible at performing function (3) fairly and objectively. I am aware that without some measure like GPAs, rewards and opportunities will often be distributed unequally, typically through networks of privilege and patronage. Function (3) cannot simply be removed without causing other problems. But it would be naive to imagine that just because grades yeild an objective seeming number, they are valid for ranking students. The data on which the numbers are based are unreliable.

Perhaps the best way to illustrate the issue is with an imaginary case. I can imagine a situation in my teaching where I could get close to the ideal grading structure to realize the goal of function (3). This goal requires assessments as objective and commensurate as possible. So, I create assessments in Latin that lend themselves to a binary correct/incorrect evaluation as objectively as possible: tests focusing on the reproduction of forms and vocabulary items. It is possible also, I think, to create similar assessments with case usage and syntax. I could probably even do it with multiple-choice, reading-comprehension questions. But I will exclude more complex tasks that are harder to assess in binary terms, such as translation into and out of the target language. The results must be both objective and commensurate: I must use the same assessments in every course, year after year. All other Latin teachers must use the same tests. This structure would suit function (3) best since its goal is to allow the objective comparison of students.

And even with all this work, Latin still wouldn’t be commensurate with other courses, where an A in Latin equals an A in Organic Chemistry, Calculus, and Classical Music Appreciation. But even its internal consistency hides unfairness. Perhaps my colleague is a better teacher than I am (ok, not “perhaps”), and my students get lower scores. Perhaps we experience a global pandemic that requires classes to shift to distance learning. Perhaps the next year, courses must be taught wearing masks and without group work. Perhaps I am less motivated one semester, and my students suffer. It is surely unfair to rank students against each other when the variability in their scores is due to externals.

And this occurs even when we try as hard as possible to be objective and commensurate. What happens to the validity of the ranking when, as now, we do not try? What happens when the assessments cannot even get close to fitting into appropriately objective structures (and many do not)?

To be clear, I’m trying to discuss function (3) on its own, without reference to how bad it is for pedagogy. This form of assessment would not suit my pedagogical goals. It would be possible to construct, for example, a myth course where assessment was completely through multiple choice tests. But, honestly, I don’t really care if the students memorize a set of mythical facts. I want them to think about myths in a variety of ways. In fact, I hope that what they learn about interpreting foundation myths, for example, helps them when they encounter them elsewhere, perhaps in contemporary politcal discourse. I hope they learn ways of seeing that enlarges their world and stays with them long after many of the facts end up, inevitably, in the dimenticatoio.

I do have assessments that allow them to show me how well they can do what I am teaching, though these do not fit into binaries of correct and incorrect. And it is mostly just my judgement about how well they managed it. I would prefer not to give them a grade whose main purpose lies in allowing the student to be ranked against other students. I do not have choice. But it is even worse: I cannot assess some of my goals, because it is impossible to assess what happens after the class. And when you think about some of my goals, I am the the person who should be assessed!

So, yes, grades are often subjective judgments. Sure. But there is little agreement about the fairest way to make those judgments. Should the grade be relative to the student? In other words, should individual students be graded against their earlier performances? Or should the grade be relative to the whole class? That is, should our grades also rank students within the class? Or are the grades given on the basis of a criterion, i.e., relative to some abstract idea of the A. If I were to guess, students and faculty often assume that criterion basis is the default. Yet, for all that we do our own thing, the final grade is treated as commensurate: a B is a B is a B.

I’ve seen good arguments for dropping grades from graduate education, primarily because motivation and fine-grained ranking are seen as unnecessary. But why should function (3) be more important in undergraduate than in graduate school? I suspect the reason is the expressed priority of function (3) above (1) and (2). For all that faculty can structure their in-class grading, they cannot opt out of the final grade. This limit on academic freedom seems widely unremarked and unremarkable, suggesting that the final grade is the most important function grading serves, even though it is the least beneficial for the student or teacher. Why should this be so?

A revealing (?) anecdote for comparison: I have a tenured colleague who has long argued against the use of teaching evaluations in tenure and review cases. Teaching evaluations are similar to grades in that they serve more than one function (e.g. feedback and evaluation). Now, nearly everyone believes that student feedback is valuable for teachers; but the fact that teaching evaluations are notoriously biased suggests strongly that they should not be used to evaluate teaching effectiveness. My colleague made a privileged stand and refused to submit her evaluations, explaining her reasoning and how she uses feedback. Her evaluations, incidentally, are very good. What happened?

The review committee was so distressed that they had the provost contact her in order to convince her to submit them. They did not fire her, but they did withhold some of her merit raise. Imagine what the response would be if someone like me, an adjunct, made a principled stand on giving final grades?

I did, in fact, in my job review explain how I structure my classes based on my view of grading as an unreliable and unfair way to rank students. It was much less radical than I what I’ve written here, but the response was, as you can imagine, not positive. My fitness for teaching was questioned. Perhaps rightly, given my distrust of that most fundamental elements of teaching, faith in the validity of the final grade. How can I be a teacher if I do not believe that the final grade is a valid way to measure and rank student performance? What even is teaching, if not that?

Now, I have sympathy with the need to believe that the rankings are valid. I have sympathy with the committee tasked with evaluating teaching effectiveness. Our world-view is meritocratic. We reward virtue and punish vice. It would be hard to keep doing that if the metrics we use are invalid. It would undermine the whole system!

So much is invested in the validity of the ranking system that I do not think it can be removed. I would be happy to do away entirely with grades, leaving feedback and motivation to individual faculty. But we would have to fundamentally rethink how to allot fairly scarce resources like awards and scholarships. It would require new mechanisms for identifying and helping struggling students. Neither of these seems insurmountable. What currently seems impossible is overcoming the damnable rage for ranking, the obsessive need for fine-grained systems of classifying, cataloging, and organizing people into hierarchies. And perhaps most difficult of all, we would have to face the possibility that we may be using invalid tools for constructing the hierarchies.

EDIT: A few days after writing this blog I came across a book edited by Susan Blum called Ungrading: Why Rating Students Undermines Learning (and What to Do Instead) (West Virginia University Press, 2020). If anything I said here resonates with you, I believe that you will find useful and interesting material in it. I did.

