No standards for test scoring
By Todd Farley
Last week, Education Secretary Arne Duncan acknowledged standardized tests are flawed measures of student progress. But the problem is not so much the tests themselves — it's the people scoring them.
Today's exams nearly always include the sort of "open ended" items where students fill up the blank pages with their own thoughts and words.
After a five-minute interview I got the job of scoring fourth-grade statewide reading comprehension tests. The for-profit testing company that hired me paid almost $8 an hour.
One of the tests I scored had students read a passage about bicycle safety. They were then instructed to draw a poster that illustrated a rule that was indicated in the text. We would award one point for a poster that included a correct rule and zero for a drawing that did not.
The first poster I saw was a drawing of a young cyclist, a helmet tightly attached to his head, flying his bike over a canal filled with flaming oil, his two arms waving wildly in the air. I stared at the response for minutes. Was this a picture of a helmet-wearing child who understood the basic rules of bike safety? Or was it meant to portray a youngster killing himself on two wheels?
I was not the only one who was confused. Soon several of my fellow scorers were debating my poster, some positing that it clearly showed an understanding of bike safety while others argued that it most certainly did not. I realized then — an epiphany confirmed over years of experience in the testing industry — that the score any student would earn mostly depended on which temporary employee viewed his response.
A few years later, still a part-time worker, I had a similar experience. For one project our huge group spent weeks scoring ninth-grade movie reviews, each of us reading approximately 30 essays an hour (yes, one every two minutes). At one point the woman beside me asked my opinion about the essay she was reading, a review of the X-rated movie "Debbie Does Dallas." The woman thought it deserved a 3 (on a 6-point scale), but she settled on that only after weighing the student's strong writing skills against the "inappropriate" subject matter. I argued the essay should be given a 6, as it was artfully written and also made me laugh my head off.
All of the 100 or so scorers in the room soon became embroiled in the debate. Eventually we concluded that the essay deserved a 6 ("genius"), or 4 (well-written but "naughty"), or a zero ("filth"). The essay was ultimately given a zero.
This kind of arbitrary decision is the rule, not the exception. The years I spent assessing open-ended questions convinced me that large-scale assessment was mostly a mad scramble to score tests, meet deadlines and rake in cash.
There is already much debate over whether the progress that Duncan hopes to measure can be determined by standardized testing at all. But in the meantime, we can give more thought to who scores these tests. We could start by requiring that scoring be done only by professionals who have made a commitment to education — rather than by people like me.
Todd Farley is the author of the forthcoming "Making the Grades: My Misadventures in the Standardized Testing Industry." He wrote this commentary for The New York Times.